Professional Documents
Culture Documents
Cte Vs Subquery Vs Temp Table
Cte Vs Subquery Vs Temp Table
You can think of a Common Table Expression (CTE) as a table subquery. A table
subquery, also sometimes referred to as a derived table, is a query that is used as the
starting point to build another query. Like a subquery, it will exist only for the duration of
the query. CTEs make the code easier to write as you can write the CTEs at the top of
your query – you can have more than one CTE, and CTEs can reference other CTEs –
and then you can use the defined CTEs in your main query.
CTEs are not physically stored on disk, and their lifespan is limited to the execution of a
single query. This means that you cannot create, alter or drop CTEs explicitly. Also, you
cannot reference a CTE from multiple queries within the same batch.
CTEs make the code easier to read, and favor reuse: imagine that in each CTE you are
defining the subset of data that you want to work on in the main query and you are
giving it a label. In the main query then you can just refer to that subset by using its
label instead of having to write the whole subquery.
CTEs also allows for some complex scenarios like recursive queries.
Advantages:
● Readability: CTEs enhance the readability of complex queries by breaking them
into modular, named components.
● Recursive Queries: CTEs can be used for recursive queries, where a query
refers to its own output.
Example:
Manisha Reddy
Let’s consider the following example, where we have a table named “Employees” with
columns “EmployeeID”, “FirstName”, “LastName”, “DepartmentID”, and “Salary”:
In this example, we’re using a CTE named “TopEmployees” to retrieve the top 10
employees with the highest salary from the “Employees” table. We then select the full
name and salary of these top employees from the CTE. The CTE allows us to simplify
the query and make it more readable by separating the top employees selection from
the final select statement.
Manisha Reddy
Temporary Table
Temporary tables are also temporary result sets that are stored in the tempdb system
database. Unlike CTEs, temporary tables are physically stored on disk, and you can
create, alter or drop them explicitly. Temporary tables can be used to store and
manipulate large amounts of data and can be used in multiple queries within the same
batch.
Temporary tables can be created using the CREATE TABLE statement with the prefix
“#” or “##” for local and global temporary tables, respectively. Local temporary tables
are only accessible from the current session and are automatically dropped when the
session ends. Global temporary tables are accessible from all sessions, and they are
dropped automatically when the last session referencing them is closed.
Let’s consider the following example, where we want to create a temporary table to
store the top 10 employees with the highest salary:
Manisha Reddy
Another reason to use a temporary table is if you have a complex query that needs to
be used one or more time in subsequent steps and you want to avoid spending time
and resource to execute that query again and again (especially if the result set is small
compared to the originating data and/or the subsequent queries will not be able to push
any optimization down to the subquery as you are working on aggregated data, for
example)
But there is no “one-solution-fits-all” here. You must try to see if, for your use case, a
subquery is enough, or a temporary table is needed to give the query engine some
leverage to get better estimations and thus a better execution plan.
Keep also in mind that using temporary tables comes with some overhead. Aside from
the obvious space usage, resources – and thus time – will be spent just for loading
them. Sometimes you might even need to create indexes on temporary tables to make
sure subsequent query performances are at the top.
The data persisted in the temporary table, also, is not automatically kept up to date with
any changes that might be made to the data in the tables used in the originating query.
It is your responsibility to refresh the data on the temporary table anytime you need it
(Another option would be to use Indexed Views: see below for more details on this
feature).
Manisha Reddy
Best Practice
The best practice for choosing between CTE and TempTable depends on the specific
needs of the query. If the query is simple and does not require the use of complex logic,
then a CTE may be the best option. However, if the query is complex or requires the
use of large amounts of data, then a TempTable may be the better choice.
Ultimately, the best way to choose between CTE and TempTable is to experiment with
both options and see which one works best for the specific needs of the query.
I’d suggest starting with CTEs because they’re easy to write and to read. If you hit a
performance wall, try ripping out a CTE and writing it to a temp table, then joining to the
temp table.
Manisha Reddy
Subquery
Subqueries (also known as inner queries or nested queries) are a tool for performing
operations in multiple steps. For example, if you wanted to take the sums of several
columns, then average all of those values, you'd need to do each aggregation in a
distinct step.
Types of Subqueries:
● Single-row Subquery: Returns a single value.
● Multi-row Subquery: Returns multiple rows.
● Multi-column Subquery: Returns multiple columns.
Subqueries can be used in several places within a query, but it's easiest to start with
the FROM statement. Here's an example of a basic subquery:
Let's break down what happens when you run the above query:
First, the database runs the "inner query"—the part between the parentheses:
If you were to run this on its own, it would produce a result set like any other query. It
might sound like a no-brainer, but it's important: your inner query must actually run on
its own, as the database will treat it as an independent query. Once the inner query
runs, the outer query will run using the results from the inner query as its underlying
table:
Manisha Reddy
Subqueries are required to have names, which are added after parentheses the same
way you would add an alias to a normal table. In this case, we've used the name "sub."
A quick note on formatting: The important thing to remember when using subqueries is
to provide some way for the reader to easily determine which parts of the query will be
executed together. Most people do this by indenting the subquery in some way. The
examples in this tutorial are indented quite far—all the way to the parentheses. This
isn't practical if you nest many subqueries, so it's fairly common to only indent two
spaces or so.
The above examples, as well as the practice problem don't really require
subqueries—they solve problems that could also be solved by adding multiple
conditions to the WHERE clause. These next sections provide examples for which
subqueries are the best or only way to solve their respective problems.
If you're having trouble figuring out what's happening, try running the inner query
individually to get a sense of what its results look like. In general, it's easiest to write
inner queries first and revise them until the results make sense to you, then to move on
to the outer query.
The above query works because the result of the subquery is only one cell. Most
conditional logic will work with subqueries containing one-cell results. However, IN is
the only type of conditional logic that will work when the inner query contains multiple
results:
Manisha Reddy
Note that you should not include an alias when you write a subquery in a conditional
statement. This is because the subquery is treated as an individual value (or set of
values in the IN case) rather than as a table.
Joining subqueries
You may remember that you can filter queries in joins. It's fairly common to join a
subquery that hits the same table as the outer query rather than filtering in the WHERE
clause. The following query produces the same results as the previous example:
This can be particularly useful when combined with aggregations. When you join, the
requirements for your subquery output aren't as stringent as when you use the WHERE
clause. For example, your inner query can output multiple results. The following query
ranks all of the results according to how many incidents were reported in a given day. It
does this by aggregating the total number of incidents each day in the inner query, then
using those values to sort the outer query:
Manisha Reddy
Subqueries can be very helpful in improving the performance of your queries. Let's
revisit the Crunchbase Data briefly. Imagine you'd like to aggregate all of the
companies receiving investment and companies acquired each month. You could do
that without subqueries if you wanted to, but don't actually run this as it will take
minutes to return:
Note that in order to do this properly, you must join on date fields, which causes a
massive "data explosion." Basically, what happens is that you're joining every row in a
given month from one table onto every month in a given row on the other table, so the
number of rows returned is incredibly great. Because of this multiplicative effect, you
must use COUNT(DISTINCT) instead of COUNT to get accurate counts.
If you'd like to understand this a little better, you can do some extra research on
cartesian products. It's also worth noting that the FULL JOIN and COUNT above
actually runs pretty fast—it's the COUNT(DISTINCT) that takes forever. More on that in
the lesson on optimizing queries.
Of course, you could solve this much more efficiently by aggregating the two tables
separately, then joining them together so that the counts are performed across far
smaller datasets:
Manisha Reddy
Note: We used a FULL JOIN above just in case one table had observations in a month
that the other table didn't. We also used COALESCE to display months when the
acquisitions subquery didn't have month entries (presumably no acquisitions occurred
in those months). We strongly encourage you to re-run the query without some of these
elements to better understand how they work. You can also run each of the subqueries
independently to get a better understanding of them as well.
Manisha Reddy
It's certainly not uncommon for a dataset to come split into several parts, especially if
the data passed through Excel at any point (Excel can only handle ~1M rows per
spreadsheet). The two tables used above can be thought of as different parts of the
same dataset—what you'd almost certainly like to do is perform operations on the entire
combined dataset rather than on the individual parts. You can do this by using a
subquery:
Stored Procedure
A stored procedure is a set of SQL statements that can be saved and executed later. It
can take parameters, making it a reusable and efficient way to perform specific tasks or
operations on a database.
Advantages:
● Code Reusability: Stored procedures can be called from multiple locations,
promoting code reuse.
● Enhanced Security: Users can execute a stored procedure without direct table
access, improving security.
Example:
Manisha Reddy
subquery:
CTE:
Manisha Reddy
Is there any reason that we might choose to use a CTE over a subquery? In terms of
performance, they are pretty much the same. Remember from our talk on the order of
operations in SQL that a subquery will run before the main query, and that is the same
with the CTE, so in either case you are basically querying a query.
If the performance of the two options are the same, why would you choose a CTE over
a subquery? For a simple query like my example, it’s probably going to come down to
personal preference. However, for more complex queries that require multiple
subqueries, using CTEs can make your query easier to understand. This is especially
important when you are writing queries that will need to be used or edited by multiple
users. With the subquery structure, it isn’t always easy to see what the author intended.
You can create multiple CTEs to use in a query, just as you can create multiple
subqueries. You can also name your CTE what ever you like (I used CTE before to
make it clear which part was the CTE, however it is better to use names that are
descriptive), this will also make it easier to understand what your query is doing.
Here’s an example:
Manisha Reddy
The query using multiple CTEs is easier to read, but you get the same results no matter
which way you write the query.
One item to point out here, when using multiple CTE, you only need to use the WITH
keyword once, you separate the individual CTEs with a comma.
So, all this to say that CTE and subqueries will accomplish the same thing. What about
temp tables? There is one major difference between CTE/subquery and temp tables. A
temp table can be accessed by multiple queries in the same SQL session. A
CTE/subquery is only available for a single query.
What does that mean? Let’s say that I had multiple queries that needed to use the
same ‘mask’ CTE, I could put that CTE at the beginning of each query, but that will
require a lot of extra typing (which is not high on my list of ways to waste time) and the
performance would deteriorate — the same temporary result set would be run for each
query, and that takes extra time. This is why we temp tables.
Let’s take a look at how this would work.
Manisha Reddy
Once you create and populate your temp table (the SELECT clause is populating the
table), you can query it multiple times until you disconnect your SQL session. Note that
this syntax is for SQLite, SQL Server and other instances of SQL that support the
SELECT INTO syntax.
So, now we can query like normal using our temp.mask table.
Manisha Reddy
Notice that we still used a CTE, your temp table works just like any other table when
querying, you just have to run it once for any session. There are some things to keep in
mind, your temp table will be stored, since it is not run for each query (like a
CTE/subquery) this can significantly improve performance if you are running multiple
queries using the same temporary data. However, because the temp table is stored if
you are only using it for a single query, the performance will be worse using a temp
table.
In summary, we can use CTE/subqueries interchangeably, but the CTE is easier to read
and see what is going on, so it is best used when a query will be used/edited by other
users. The use case for these is single queries — the information will not need to be
accessed by multiple queries in a single SQL session. A temp table is also a temporary
data source, but it will be available for multiple queries during the same SQL session.
Manisha Reddy
Indexed Views
A special kind of Views, the Indexed Views, can be created so that the produced result
is materialized and persisted into the database data file. With Indexed Views, the result
doesn’t need to be re-calculated every time, so they are great for improving read
performances. In HTAP scenarios they can help to get a great performance boost. The
database engine will also make sure that every time data in one of the based tables
used in an Indexed View is updated, the persisted result is updated too, so that you
always have fresh and updated values.