DAE - Aula 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Data Science & Business Analytics

Data Anal ysis and


Exploration

Alexandre Gomes Baptista


Plano de Aulas
Aula Dia Hora Temática Docente
18:00-20:00
1 07/11/2023 Corporate BI – Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
2 08/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
3 14/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
4 15/11/2023 DAX - Data Analysis Expressions Alexandre Baptista
20:30-22:30
18:00-20:00
5 21/11/2023 DAX - Data Analysis Expressions Alexandre Baptista
20:30-22:30
18:00-20:00
6 22/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
7 28/11/2023 Prática Laboratorial em Excel Alexandre Baptista
20:30-22:30
12/12/2023 19:00-20:00 Exame Época Normal Alexandre Baptista

Exame Época Recurso Alexandre Baptista


Índice
• SSAS Concepts

• Exercise 1 – Build a Semantic Model


• Create a Tabular project
• Importing Data into the Project
• Set Date Table
• Relations
• Calculated Column
SSAS Concepts – Date Table
In order to use time-intelligence functionsin DAX formulas, you must specify a date table and a unique identifier (datetime) column of
the Date data type. Once a column in the date table is specified as a unique identifier, you can create relationships between columns in the
date table and any fact tables.

• PARALLELPERIOD - Returns a table that contains a column of dates


that represents a period parallel to the dates in the specified dates
column, in the current context, with the dates shifted a number of
intervals either forward in time or back in time.
• PREVIOUSDAY - Returns a table that contains a column of all dates
representing the day that is previous to the first date in the dates
column, in the current context.
• NEXTDAY - Returns a table that contains a column of all dates from
the next day, based on the first date specified in the dates column in
the current context.
• SAMEPERIODLASTYEAR - Returns a table that contains a column
of dates shifted one year back in time from the dates in the
specified dates column, in the current context
• DATESYTD - Returns a table that contains a column of the dates for
the year to date, in the current context.
SSAS Concepts – Cardinality Types
Each model relationship is defined by a cardinality type. There are four cardinality type options, representing the data characteristics of the
"from" and "to" related columns. The "one" side means the column contains unique values; the "many" side means the column can
contain duplicate values.

Many-to-one (*:1) & One- One-to-one (1:1)


to-many (1:*) A one-to-one relationship means both columns
contain unique values. This cardinality type
The one-to-many and many-to-one isn't common, and it likely represents a
cardinality options are essentially the suboptimal model design because of the
same, and they're also the most storage of redundant data.
common cardinality types.

When configuring a one-to-many or


Many-to-many (*:*)
many-to-one relationship, you'll choose A many-to-many relationship means both
the one that matches the order in which columns can contain duplicate values. This
you related the columns. Consider how cardinality type is infrequently used. It's
you would configure the relationship typically useful when designing complex
from the Product table to the Sales model requirements. You can use it to relate
table by using the ProductID column many-to-many facts or to relate higher grain
found in each table. facts. For example, when sales target facts
are stored at product category level and the
product dimension table is stored at product
level.
SSAS Concepts – Cross & Bi-Directional
Filtering
Cross filtering is the ability to set a filter context on a table based on values in a related table, and bi-directional is the transference
of a filter context to second related table on the other side of a table relationship. As the name implies, you can slice in both
directions of the relationship rather than just one way. Internally, two-way filtering expands filter context to query a superset of your data.

Configure Bi-
Directional
Filtering

Cross-
Filtering
SSAS Concepts – Calculated Columns
When you create a data model you can extend a table by creating new columns. The content of the columns is defined by a DAX
expression evaluated row by row. You can rename the new column before or after defining the expression by right-clicking the new
column and selecting the Rename Column menu item. As you see in the following picture, the DAX formula you write does not contain the
column name and starts with the assignment symbol (=).

A calculated column is just like any other column in a table and you can use it in any part of a report. You can also use a calculated
column to define a relationship if needed. The DAX expression defined for a calculated column operates in the context of the current
row across that table. Any reference to a column returns the value of that column for the current row. You cannot directly access the values
of other rows.

One important concept that you need to remember about calculated columns is that they are computed during the database
processing and then stored in the model. This might seem strange if you are accustomed to SQL-computed columns – not persisted –
which are computed at query time and do not use memory. In data models for DAX, however, all calculated columns occupy space in
memory and are computed during table processing.

This behavior is helpful whenever you create very complex calculated columns. The time required to
compute them is always process time and not query time, resulting in a better user experience.
SSAS Concepts – Calculated Measures
There is another way of defining calculations in a DAX model, useful whenever you do not want to compute values for each row but,
rather, you want to aggregate values from many rows in a table. These calculations are measures. A measure needs to be defined in
a table. This is one of the requirements of the DAX language. However, the measure does not really belong to the table. In fact, you can
move a measure from one table to another one without losing its functionality.
Gross Margin % := DIVIDE ( SUM ( Sales[GrossMargin] ), SUM (Sales[SalesAmount] ) )

Measures and calculated columns both use DAX expressions. The difference is the context of evaluation. A measure is evaluated in
the context of the cell evaluated in a report or in a DAX query, whereas a calculated column is computed at the row level within
the table it belongs to.

The context of the cell depends on user selections in the report or on the shape of the DAX query. So when you use:

• SUM(Sales[SalesAmount]) in a measure, you mean the sum of all the cells that are aggregated under this cell,
• Sales[SalesAmount] in a calculated column, you mean the value of the SalesAmount column in the current row.
SSAS Concepts – Related and RelatedTable
RELATED and RELATEDTABLE functions allow following & navigating through the chain of relationships in Power BI, Power Pivot & SSAS
Tabular and It is used to retrieve related data from another table.

RELATED(Column) follows existing many-to-one relationship(s) from the many side to the one side and returns the single matching value
from the other table. In other words, RELATED can access the one-side from the many-side because there is only one rows exists in the
related table and if no matching row exists, RELATED will return BLANK.

PRODUCT
ProductName=RELATED(Products[ProductName])
ProductCategory=RELATED(Categories[ProductCategoryName])
ProdID prodName pvp
101 Bicicleta 1 10€
102 Bicicleta 2 60€ Add
Calculated
103 Bicicleta 3 10€
Column
1

*
SALES
ID ProdID Quant Value prodName
1 101 1 10€ Bicicleta 1
2 102 2 120€ Bicicleta 2
3 101 6 60€ Bicicleta 1
SSAS Concepts – Related and RelatedTable
RELATEDTABLE(Table) follows a relationship in either direction (many-to-one or one-to-many) and returns a table containing all the rows
that are related to the current row from the specified table. This is very useful when you want to find all the transactions associated with a
particular row of a related table.

Total Products = COUNTROWS(RELATEDTABLE(FactSales))


Total Sales Amount =SUMX(RELATEDTABLE(FactSales), FactSales[SalesAmount])
PRODUCT
ProdID prodName pvp # Sales TotalSales
101 Bicicleta 1 10€ 2 70€
102 Bicicleta 2 60€ 1 120€
103 Bicicleta 3 10€ 0 0€
1

*
SALES
ID ProdID Quant Value Add
Calculated
1 101 1 10€
Columns
2 102 2 120€
3 101 6 60€
Creating an Analysis Services Tabular
Project
In this task, you will create an Analysis Services Tabular project.
1. To create a new AW Internet Sales solution, on the File menu, select New | Project
2. In the Add New Project window, in the Installed Templates pane, select Analysis Services located under the Business Intelligence
list.
3. Select the Analysis Services Tabular Project template.

Figure - Selecting the Analysis Services Tabular Project Template

4. In the Name box, replace the text with dae11_aln_[nAluno], and then click OK.
5. In the Tabular model designer specify Integrated Workspace and Compatibility level Sql Server 2022 then click OK.
6. In Solution Explorer, notice the Sales Analysis project consists of a single item named Model.bim. This item is the model you will
develop in this lab.
Note: Each Tabular project consists of a single model, and no
7. Notice that the Model.bim item was automatically opened upon project creation. additional models can be added. When deployed, the project
8. To save the project, on the File menu, select Save All. creates a database on the target Analysis Services instance,
and the model can be queried.
Importing Data into the Project
In this task, you will import data into the tabular model created before
1. In Tabular Model Explorer, right-click Data Sources > Import from Data Source.
2. In the Get Data window, under Azure, click Azure SQL Database > Connect.
3. In Server, type dsba-datawarehouse.database.windows.net.
4. In the Database field, select Datawarehouse, and then click OK.
5. In the SQL Server Database page, specify database to specify the credentials
Analysis Services will use to connect to the data source when importing and
processing data. In the username specify dwreader and specify the password
idefedsba01! and then click OK
6. In the Select Tables and Views page, select the check box for the following tables:
DimCurrency, DimCustomer, DimDate, DimGeography, DimProduct,
DimProductCategory, DimProductSubcategory, and FactInternetSales.
7. Click Transform Data (not Load)
Importing Data into the Project
In this task, you will filter the data to be imported into the tabular model
The DimCustomer table that you're importing from the sample database contains a subset of the data from the
original SQL Server Adventure Works database. You will filter out some more of the columns from the
DimCustomer table that aren’t necessary when imported into your model. When possible, you'll want to filter
out data that won't be used in order to save in-memory space used by the model.

1. The Power Query Editor window opens. Ensure the DimCustomer source table is selected.
2. Clear the checkbox at the top / delete the following columns: SpanishEducation, FrenchEducation,
SpanishOccupation, FrenchOccupation.

Since the values for these columns are not relevant to Internet sales analysis, there is
no need to import these columns. Eliminating unnecessary columns will make your
model smaller and more efficient. Notice the list of applied filters to the data on the
right. All history regarding the data transformation is stored for future reference / Change
Importing Data into the Project
In this task, you will filter the data to be imported into the tabular model. Eliminate
the following Columns DimProduct DimProductCategory

DimGeography SpanishProductName SpanishProductCategoryName


FrenchProductName FrenchProductCategoryName
SpanishCountryRegionName FrenchDescription
FrenchCountryRegionName ChineseDescription
IpAddressLocator ArabicDescription DimProductSubcategory
HebrewDescription
DimDate ThaiDescription SpanishProductSubcategoryName
GermanDescription FrenchProductSubcategoryName
DateKey JapaneseDescription
SpanishDayNameOfWeek TurkishDescription
FrenchDayNameOfWeek
SpanishMonthName
FrenchMonthName FactInternetSales

OrderDateKey
DueDateKey
ShipDateKey
Importing Data into the Project
In this task, you will Import the selected tables and column data
Now that you've previewed and filtered out unnecessary data, you can import the rest of the data you do
want. The wizard imports the table data along with any relationships between tables. New tables and
columns are created in the model and data that you filtered out will not be imported.

To import the selected tables and column data


1. Review your selections. If everything looks okay, click Import.
2. If requested insert the database credentials again:
Username: dwreader
Password: idefedsba01!

While importing the data, the wizard displays how many rows have been
fetched. When all the data has been imported, a message indicating success
is displayed.

3. When import finish press the Close button


Mark as Date Table
In the last exercise you imported a dimension table named DimDate. While in your model this table is named
DimDate, it can also be known as a Date table, in that it contains date and time data. Whenever you use DAX
time-intelligence functions in calculations, as you'll do when you create measures a little later, you must
specify date table properties, which include a Date table and a unique identifier Date column in that table.

In this task, you will mark DimDate as a Date Table


Before we mark the date table and date column, we need to do a little housekeeping to make our model easier to understand.
You'll notice in the DimDate table a column named FullDateAlternateKey. It contains one row for every day in each calendar
year included in the table. FullDateAlternateKey is not really a good identifier for this column. We'll rename it to Date,
making it easier to identify and include in formulas. It's a good idea to rename objects like tables and columns to make them
easier to identify in client reporting applications like Power BI and Excel.

To rename the FullDateAlternateKey column


1. In the model designer, click the DimDate table.
2. Double click the header for the FullDateAlternateKey column, and then rename it to Date.

To set Mark as Date Table


1. Select the Date column, and then in the Properties window, under Data Type, make sure Date is selected.
2. Click the Extensions -> Table menu, then click Date, and then click Mark as Date Table.
3. Click OK.
Create Relationships
In this task, you will Review existing relationships and add new relationships
When you imported data by using the Table Import Wizard, normally existing relationships are automatically imported together
with the data. However, before you proceed with authoring your model you should verify those relationships between tables were
created properly.

1. Click the Extensions > Model menu > Model View > Diagram View.

The model designer now appears in Diagram View, a graphical format


displaying all of the tables you imported with lines between them. The
lines between tables indicate the relationships that were automatically
created when you imported the data.

Use the minimap controls in the lower-right corner of the model


designer to adjust the view to include as many of the tables as
possible. You can also click and drag tables to different locations,
bringing tables closer together, or putting them in a particular order.

Moving tables does not affect the relationships already between the
tables. To view all of the columns in a particular table, click and drag
on a table edge to expand or make it smaller.
Create Relationships
In this task, you will Review existing relationships and add new relationships
1. Create the Relationships as shown in the table on the Left In addition to using the model designer in diagram
2. Click the solid line between the FactInternetSales table and view, you can also use the Manage Relationships
dialog box to show the relationships between all
the DimProduct table. tables in a table format. Right-click Relationships in
The solid line between these two tables show this relationship is active, Tabular Model Explorer, and then click Manage
Relationships. The Manage Relationships dialog box
that is, it is used by default when calculating DAX formulas. Notice the shows the relationships that were automatically
created when you imported data.
GeographyKey column in the DimCustomer table and the GeographyKey
column in the DimGeography table now both each appear within a box.
This shows these are the columns used in the relationship. The
relationship’s properties now also appear in the Properties window.

2. Use the model designer in diagram view, or the Manage


Relationships dialog box, to verify if the relationships on
the left were created when each of the tables were imported:
If any of the relationships in the table on the right are missing, create them
manually
Create Relationships
In this task, you will add new relationships
1. In the model designer, in the FactInternetSales table, click and hold on the
OrderDate column, then drag the cursor to the Date column in the DimDate
table, and then release.
A solid line appears showing you have created an active relationship between the
OrderDate column in the Internet Sales table and the Date column in the Date
table.

2. In the FactInternetSales table, click and hold on the DueDate column, then drag
the cursor to the Date column in the DimDate table, and then release.
A dotted line appears showing you have created an inactive relationship between
the DueDate column in the FactInternetSales table and the Date column in the
DimDate table. You can have multiple relationships between tables, but only one
relationship can be active at a time.

3. Finally, create one more relationship, in the FactInternetSales table, click and hold
on the ShipDate column, then drag the cursor to the Date column in the DimDate
table, and then release.
Create Calculated Columns

1. Click the Extensions > Model menu > Model View > Data View. Calculated columns can only be created by using the
model designer in Data View.
2. In the model designer, click the DimDate table (tab).
3. Right-click the CalendarQuarter column header, and then click Insert Column. A new column named Calculated Column
1 is inserted to the left of the Calendar Quarter column.
4. In the formula bar above the table, type the following formula. AutoComplete helps you type the fully qualified names of
columns and tables and lists the functions that are available.

=RIGHT(" " & FORMAT([MonthNumberOfYear],"0#"), 2) & " - " & [EnglishMonthName]

Values are then populated for all the rows in the calculated column. If you scroll down through the table, you will
see that rows can have different values for this column, based on the data that is in each row.

5. Rename this column to MonthCalendar


Create Calculated Columns
In this task, you will create a new calculated column named DayOfWeek
1. With the DimDate table still active, click on the Extensions > Column menu, then click Add Column.
2. In the formula bar, type the following formula:

=RIGHT(" " & FORMAT([DayNumberOfWeek],"0#"), 2) & " - " & [EnglishDayNameOfWeek]

3. When you've finished building the formula, press ENTER. The new column is added to the
far right of the table.

4. Rename the column to DayOfWeek.


5. Click on the column heading, and then drag the column between the EnglishDayNameOfWeek
column and the DayNumberOfMonth column.
Create Calculated Columns
Create 2 calculated columns – ProductSubcategoryName & ProductCategoryName
ProductSubCategoryName calculated column
1. In the DimProduct table, scroll to the far right of the table. Notice the right-most column is named Add Column
(italicized), click the column heading.
2. In the formula bar, type the following formula.
=RELATED('DimProductSubcategory'[EnglishProductSubcategoryName])
3. Rename the column to ProductSubcategoryName.

ProductCategoryName calculated column


1. With the DimProduct table still active, click the Extensions > Column menu, and then click Add Column.
2. In the formula bar, type the following formula:

3. Rename the column to ProductCategoryName.


Create Calculated Columns
Create 1 calculated columns – TotalNumberOfCustomers

TotalProductSales calculated column


1. In the DimGeography table, scroll to the far right of the table. Notice the right-most column is named Add
Column (italicized), click the column heading.
2. In the formula bar, type the following formula.
=COUNTX(RELATEDTABLE(DimCustomer),DimCustomer[CustomerKey])
3. Rename the column to TotalNumberOfCustomers.
Create Calculated Columns
Create a Margin calculated column in the FactInternetSales table
1. In the model designer, select the FactInternetSales table.
2. Add a new column.
3. In the formula bar, type the following formula:

=[SalesAmount]-[TotalProductCost]

4. Rename the column to Margin.


5. Drag the column between the SalesAmount column and the TaxAmt column.
Analyze the Model in Excel
Open Excel and Analyze the model you just created
1. In the menu options select Extensions > Model > Analyze in Excel
2. Select Current Windows User.
3. Press OK
4. Explore the created Model using Excel
www.isegexecutive.education

You might also like