Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

ACADEMY

Exasol SQL
Exasol SQL

– Exasol supports the major part of the SQL Standard 2008


– Some features are not (yet) supported, e.g. CURSORS
– Some extensions

DQL
SELECT
UNION [ALL]
DCL INTERSECT / MINUS
CREATE USER
CREATE ROLE
GRANT PRIVILEGES
DDL
CREATE SCHEMA
DML CREATE TABLE
INSERT CREATE VIEW
UPDATE
DELETE/TRUNCATE
MERGE
IMPORT/EXPORT

2 Exasol SQL ACADEMY

Exasol SQL

Database definition (DDL)


The structure of the database is defined by means of the Data Definition Language (DDL).
This concerns, in particular, the creation of schemas and all of the schema objects
contained therein such as tables, views, functions and scripts. It also includes constraints
on the tables, but content of tables is not defined via DDL.
Access control (DCL)
The SQL statements of the Data Control Language (DCL) are used to control the database
access rights. Through the management of users and roles as well as the granting of
privileges, it can be determined who is permitted to perform what actions in the database.
Manipulation of the database(DML)
The content of tables can be changed using the Data Manipulation Language (DML).
Important statements are:
INSERT Inserting constant values and subquery results
UPDATE The table‘s content can also be updated. You can qualify the updatable rows by a
WHERE condition.
DELETE Deletes certain rows of a table.
In case of an error during the update process, no ROLLBACK is executed.
TRUNCATE Deletes the whole content of a table. In contrary to e.g. Oracle only the
content is deleted – the privileges are not concerned.
MERGE Unifies the three commands UPDATE, DELETE and INSERT. See above.
Query language (DQL)
By the use of the Data Query Language (DQL) you can analyze the database (SELECT
statement).
Important constructs are:
Table operators UNION [ALL], INTERSECT, MINUS
Identifiers

– Each object in the database has a unique name

Database

Root-
User Role Connection
Catalog

Schema

Table View Function Script

Constraints Columns

– These names are referenced via SQL identifiers


– SQL identifiers: 128 characters
– Different handling of regular and delimited identifiers

3 Exasol SQL ACADEMY

Identifiers

Every database object has its own (local) unique name.


Within SQL commands, those names are referenced via SQL identifiers.
SQL identifiers are limited to 128 characters.
The handling of regular and delimited identifiers is different.
Regular identifiers

– Defined without quotes


– Case-insensitive
– Must not contain the following ASCII characters:

"'+-*/<>={}[]().,;:?|&^%!

– Stored in the database in upper case:


identifier name in database
ABC ABC
aBc ABC
a123 A123
CREATE TABLE aBc (…);
SELECT * FROM abc;

4 Exasol SQL ACADEMY

Regular identifiers

are stated without quotation marks. They must start with a letter (unicode classes Lu, Ll,
Lt Lu, Ll, Lt, Lm, Lo and Nl). For the further symbols, characters from the unicode classes
(Mn, Mc, Nd, Pc, Cf) are additionally allowed. This is SQL standard compliant.
For German-speaking users this means that also umlauts are allowed as part of regular
identifiers.
A further restriction besides the character set is that reserved words cannot be used as a
regular identifier. If you want to use characters which are prohibited for regular
identifiers, you can use delimited identifiers (see next section). E.g. if you want to use the
word table as identifier, you have to quote it ("TABLE"), since it is a reserved keyword.
Regular identifiers are always stored in the database in upper case. Therefore, they are
not case sensitive in SQL text. As shown in the above example, the two identifiers (ABC)
and (aBc) are identical.
Delimited identifiers

– Enclosed in double quotation marks: "abc"


– Case-sensitive
– May contain any characters, except the dot('.')
– Exception: For users and roles, the dot is allowed (e.g. email addresses)
– Can be reserved keywords (except "ROWNUM")
– Stored case-sensitive in the database
– Exception: User and roles are always case-insensitive (upper-case)

Identifier Name in database


"ABC" ABC
"abc" abc
"_x_" _x_
"ab""c" ab"c CREATE TABLE "abc" (…);
SELECT * FROM "abc";

5 Exasol SQL ACADEMY

Delimited identifiers

These identifiers are names enclosed in double quotation marks. Any character can be
contained within the quotation marks except the dot ('.'). If you want to use a quotation
mark in the name, it must be doubled (e.g. "ab""c" indicates the name ab"c).
Excepting users, roles and passwords, identifiers in quotation marks are always stored case
sensitive in the database.
Reserved keywords
– Cannot be used as regular identifiers
– Usually part of the SQL gramma
– List of reserved words in EXA_SQL_KEYWORDS

– Examples:

"SELECT" "YES" "USER" "DATE" "TIME" …

– EXAplus: Syntax-Highlighting for reserved keywords

SELECT 14 AS "IDENTITY";
SELECT current_user AS "USER";

6 Exasol SQL ACADEMY

Reserved keywords

There is a number of reserved words in Exasol, which cannot be used as regular identifiers.
For example, the keyword 'SELECT' is a reserved word. If a table has to be created with
this name, it will only be possible if the name will be put in double quotation marks.
"SELECT" as a table name differs however from table names such as "Select" or "seLect".
The list of reserved words can be found in the EXA_SQL_KEYWORDS system table:
SELECT * FROM exa_sql_keywords WHERE reserved=true
Exasol Data Types

BOOLEAN
CHAR(n) n in [1;2,000]
DATE
DECIMAL(p,s) p in [1;36], s in [0;p]
DOUBLE PRECISION
GEOMETRY[(srid)] srid defines the spatial reference system
INTERVAL DAY [(p)] TO SECOND [(fp)] p in [1;9], fp in [0;9] accurate to a millisecond

INTERVAL YEAR [(p)] TO MONTH p in [1;9]


TIMESTAMP Accurate to a millisecond
TIMESTAMP WITH LOCAL TIME ZONE Aware of the session time zone

VARCHAR(n) n in [1;2,000,000]

– Several aliases: DOUBLE = DOUBLE PRECISION, INT = DEC(18,0), …

7 Exasol SQL ACADEMY

Exasol Data Types

Common data types are supported:

BOOLEAN
CHAR
VARCHAR
DECIMAL
DOUBLE PRECISION
DATE
INTERVAL
TIMESTAMP

Notice that DATE doesn’t include hours, minutes and seconds in Exasol.

Not supported types:


Blob
Binary (GEOMETRY)
Clob (use VARCHAR)
Data Type: String

– CHAR(n [CHAR]) [[CHARACTER SET] encoding]


– Encoding: UTF-8 or ASCII (7 bit)
– Maximal length: 2,000 characters
– Fixed-sized and padded with whitespace

– VARCHAR(n [CHAR]) [[CHARACTER SET] encoding]


– Encoding: UTF-8 or ASCII (7 bit)
– Maximal length: 2 million characters
– Variable-sized, depending on actual string length: 'a' is different to 'a '

An empty string ('') is translated to a NULL value


–- string concatenation
String literals are enclosed in single quotes SELECT 'hello'||'!';
String comparison is always case-sensitive!

8 Exasol SQL ACADEMY

Data Type: String

The two string types can store up to 2 million characters, either in ASCII or UTF8 encoding.
In the latter one those 2 million characters can consume up to 8 million bytes, while the
first one can not contain any country-specific character.
The conversion into the (designated) output encoding of the client is done by the
corresponding driver.
The database can store and process all existing characters of a certain character set,
independent to the ability of the client displaying the characters. For example, EXAplus
converts linefeeds (\n) into spaces to preserve tabular output formatting.
Strings literals are automatically interpreted with the best fitting data type. That‘s why an
explicit cast is usually not necessary.
Any data type can be converted to a string by the use of the function TO_CHAR. For the
other way round you can use the functions TO_DATE, TO_TIMESTAMP and TO_NUMBER.
The optional parameter 'format' defines how the data should be interpreted. More details
on format strings can be found in the User Manual, Section 2.5 „Format Models“.
The length of a string can be determined via the LENGTH function family (either the
number of characters, the number of bytes or the number of bits).
Data Type: DATE

– DATE:
– Consists of (year, month, day).
– Valid range is '0001-01-01' to ' 9999-12-31'.
– Either a completely valid date or NULL.
– Values like '2009-01-00' or '2009-02-30' are not possible!
– Current date:
– SYSDATE (database time zone)
– CURRENT_DATE (session time zone)

10 Exasol SQL ACADEMY

Data Type: DATE

The DATE data type stores a date including day, month and year (4 digits). Only valid date
values are allowed in the range of 0000-01-01 to 9999-12-31.
It‘s not possible to omit some fields (day/month) or set them to invalid values (0, >12/31).
Therefore a date is either completely valid or a NULL value. Values like '2009-01-00' or
'2009-02-30' are not allowed and lead to an 'invalid date value' data exception.
Data Type: TIMESTAMP

– TIMESTAMP:
– Consists of (Date, Time accurate to a millisecond).
– Time from '00:00:00' to '23:59:59.999'

– Current timestamp:
– SYSTIMESTAMP (database time zone)
– CURRENT_TIMESTAMP, LOCALTIMESTAMP, now() (session time zone)

– TIMESTAMP WITH LOCAL TIME ZONE:


– Internal normalization to UTC timestamp
– For any in- or output, the UTC timestamp is implicitly converted to a timestamp in the session time zone
– This may lead to different results for the same query within different sessions:

SELECT …
INSERT … VALUES
('2018-01-05 20:15:00.000',…) 2018-01-06 04:15:00.000

11 Exasol SQL ACADEMY

Data Type: TIMESTAMP

The TIMESTAMP data type is similar to the DATE data type, but contains also a time (hours,
minutes, seconds, milliseconds). The valid range is [0000-01-01 00:00:00.000; 9999-12-31
23:59:59.999]. Invalid values lead to an 'invalid time value' data exception.
Datetime literals are created by specifying the corresponding keyword (date/timestamp),
followed by a string with the corresponding value (ISO format YYYY-MM-DD HH:MI:SS.FF3).
NULL values can be inserted by using the empty string (timestamp '').
WHERE x = date '2009-01-16'
WHERE y = timestamp '2009-03-12 12:37:23.003'
Normal strings are if needed automatically converted into the corresponding data type.
Please consider that this conversion uses the current datetime formats for displaying and
interpreting datetime strings (NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT).
WHERE x = '2009-01-16'
WHERE x = cast('16.01.2009' as timestamp)
If the existing format does not match with that format, you can specify an own format
when using the conversion functions TO_DATE and TO_TIMESTAMP.
WHERE x = to_date('16.01.2009', 'DD.MM.YYYY')
Regular subselect

– Example:

SELECT * FROM (
SELECT ma.*, ci.CITY_NAME AS CITY_NAME
FROM MARKETS ma JOIN CITIES ci
ON ma.CITY_ID = ci.CITY_ID
WHERE ci.CITY_NAME LIKE 'Neu%'
)
WHERE POPULATION > 10000;

12 Exasol SQL ACADEMY

Regular subselect

Subselects can be used like tables in SQL statements as shown above.


Common Table Expression (CTE):

– Example:

WITH CUSTOMERS_IN_CITIES_STARTING_WITH_NEU AS
(
SELECT ma.*, ci.CITY_NAME AS CITY_NAME
FROM MARKETS ma JOIN CITIES ci
ON ma.CITY_ID = ci.CITY_ID
WHERE ci.CITY_NAME LIKE 'Neu%'
)
SELECT * FROM CUSTOMERS_IN_CITIES_STARTING_WITH_NEU
WHERE POPULATION > 10000;

13 Exasol SQL ACADEMY

Common Table Expression (CTE)

If a subquery is used multiple times within the same query, the usage of CTEs is
recommended.

They can be used for the combination of iterative queries, too:


with a as (select …),
b as (select … from a),
c as (select … from b, a),

select * from z;
Correlated subselect

– Example: (all markets from cities with > 5 markets)


– Main usage: IN or EXISTS
WITH cte1 AS (
SELECT count(1) AS MARKET_COUNT, CITY_ID
FROM MARKETS
GROUP BY CITY_ID
HAVING count(1) > 5
)
SELECT ma.*
FROM MARKETS ma WHERE EXISTS
( SELECT 1 FROM cte1
WHERE cte1.CITY_ID = ma.CITY_ID
);

14 Exasol SQL ACADEMY

Correlated subselect

The CTE in the example above is executed for each iteration of the outer query here.
Aliases: The problem

– Aliases cannot be used within the same subselect for:


– GROUP BY
– WHERE
– CASE WHEN
– HAVING

SELECT count(1) AS MARKET_COUNT, CITY_ID


FROM MARKETS
GROUP BY CITY_ID
HAVING MARKET_COUNT > 5
;
> [42000] object MARKET_COUNT not found [line 4, column 8]

15 Exasol SQL ACADEMY

Aliases: The problem

According to the SQL-Standard, aliases cannot be used within the same subselect for:
GROUP BY
WHERE
CASE WHEN
HAVING
Aliases: Solution with LOCAL

– Keyword 'local’ allows to use aliases

SELECT count(1) AS MARKET_COUNT, CITY_ID


FROM MARKETS
GROUP BY CITY_ID
HAVING local.MARKET_COUNT > 5
;

MARKET_COUNT CITY_ID
6 9661
10 9671
6 4897
10 9652
8 9672

16 Exasol SQL ACADEMY

Aliases: Solution with LOCAL

Exasol provides the keyword 'local' to reference aliases as shown in above example.
CURRENT_USER vs. SCOPE_USER

– CURRENT_USER shows the currently connected user

SELECT CURRENT_USER;

CURRENT_USER
JIM

– SCOPE_USER shows the owner of the current object


– new in Exasol Version 6.1
ALTER SCHEMA X CHANGE OWNER BARBARA;
CREATE VIEW X.USER_TEST AS SELECT CURRENT_USER, SCOPE_USER;
SELECT * FROM USER_TEST;

CURRENT_USER SCOPE_USER
JIM BARBARA

17 Exasol SQL ACADEMY

CURRENT_USER vs. SCOPE_USER

CURRENT_USER shows the username of the currently connected user


SCOPE_USER (new in Exasol Version 6.1) is the owner of the current object (for views and
UDF scripts):
Scalar Functions

– Computed for each row of the input table


– Available for several input types:

– Numeric SELECT power (2,10);


➢ 1024

– Strings SELECT substr('abcdefg', 2, 5);


➢ 'bcdef'

– Date / Timestamp SELECT year(sysdate) AS current_year;


➢ 2016

SELECT st_distance('POINT(0 0)', 'POINT(3 4)')


– Geometry
➢ 5

– …
18 Exasol SQL ACADEMY

Scalar Functions

Scalar functions are computed separately for each row of the input table
Those functions are available for several input types as shown above.
Explicit Conversion between Date Types

SELECT TO_CHAR(current_date, 'DD. MONTH YYYY', 'NLS_DATE_LANGUAGE=GERMAN');


SELECT TO_DATE('08222016', 'MMDDYYYY');
SELECT TO_TIMESTAMP('08222016 18:36:55', 'MMDDYYYY HH:MI:SS');
-- Format and language optional, default is NLS_DATE_FORMAT,
-- NLS_TIMESTAMP_FORMAT, NLS_DATE_LANGUAGE (from session parameter)

Special format elements

YYYY WW IYYY IW
2008-12-31 2008 53 2009 01
2009-01-01 2009 01 2009 01
2005-12-31 2005 53 2005 52
2006-01-01 2006 01 2005 52

20 Exasol SQL ACADEMY

Explicit Conversion between Date Types

Datetime data types can be explicitly converted among each other and into string data
types. A conversion from and to UNIX timestamp can be done by its definition (seconds
since 01.01.1970) and the use of functions SECONDS_BETWEEN and ADD_SECONDS.
One specialty exist for format elements
'WW' for week
'IW' for ISO week
'IYYY' for ISO year
The difference is, that the WW format counts days starting from 1th of January, while the
IW format always counts from Monday to Sunday (the first week of a year has to have at
least 4 days). That means that the first or last days of a year can be assigned to the last or
new year.
Using CAST

– Can be used to transform values into any other format:

SELECT cast(42 AS char(10)) char_col;


➢ '42 '

21 Exasol SQL ACADEMY

Using CAST

Additionally to the standard conversion functions, CAST be be used to flexibly transform


values into any other required format.
Regular Expressions: REGEXP_INSTR

– REGEXP_INSTR, REGEXP_SUBSTR and REGEXP_REPLACE offer regular expressions


– Example: Is there a bracket in a string?

SELECT
DISTINCT PRODUCT_GROUP_DESC,
CASE REGEXP_INSTR(PRODUCT_GROUP_DESC, '[()]')
WHEN 0 THEN false ELSE true
END AS HAS_BRACKET
FROM ARTICLE;

PRODUCT_GROUP_DESC HAS_BRACKET
Frozen Foods false
Drinks (returnable bottles) true

22 Exasol SQL ACADEMY

Regular Expressions: REGEXP_INSTR

The functions INSTR, SUBSTR and REPLACE are only used with constant strings.
But patterns can be passed to functions using regular expression.
Exasol supports Perl Compatible Regular Expressions.

Above shows an example for REGEXP_INSTR() usage.


Regular Expressions: REGEXP_REPLACE

– Example: Replace all vowels

SELECT
DISTINCT PRODUCT_GROUP_DESC,
REGEXP_REPLACE(PRODUCT_GROUP_DESC, '[aeiou]', '#')
AS REPLACE_VOWELS
FROM ARTICLE;

PRODUCT_GROUP_DESC REPLACE_VOWELS

Frozen Foods Fr#z#n F##ds


Drinks (returnable bottles) Dr#nks (r#t#rn#bl# b#ttl#s)

23 Exasol SQL ACADEMY

Regular Expressions: REGEXP_REPLACE

Above example replaces all vowels with the # character


Regular Expressions: REGEXP_SUBSTR

– Example: Extraction of email address

SELECT REGEXP_SUBSTR
('My mail address is my_mail@yahoo.com',
'(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}')
AS EMAIL;

EMAIL
my_mail@yahoo.com

24 Exasol SQL ACADEMY

Regular Expressions: REGEXP_SUBSTR

Above example shows how an email address can be extracted from a string field.
Aggregate Functions

– Return a single value for each group


SELECT ARTICLE_ID, sum(AMOUNT*PRICE) AS ARTICLE_SUM
FROM SALES_POSITION
GROUP BY ARTICLE_ID;

– Without GROUP BY the whole table is treated as one single group:

SELECT max(BASE_SALES_PRICE), avg(BASE_SALES_PRICE)


FROM ARTICLE;

25 Exasol SQL ACADEMY

Aggregate Functions

A GROUP BY clause defines disjoint subsets (groups) of rows in a table.


For each group, the aggregate function returns a single value.
If the GROUP BY clause is omitted and at least one aggregate function is used, the whole
table is treated as one single group.
In this case only aggregations and constants are allowed.
Aggregate Functions continued

– Numeric values supported in GROUP BY clause:


SELECT ARTICLE_ID, 'total:', sum(AMOUNT*PRICE) ARTICLE_SUM
FROM SALES_POSITION
GROUP BY 1;

– HAVING clause to filter on aggregated values:

SELECT ARTICLE_ID, sum(AMOUNT*PRICE) ARTICLE_SUM


FROM SALES_POSITION
GROUP BY ARTICLE_ID
HAVING local.ARTICLE_SUM > 2000000;

26 Exasol SQL ACADEMY

Aggregate Functions continued

Constants used in the SELECT list do not need to appear in the GROUP BY clause. Numeric
values will be interpreted as column numbers of the result set.
Using the HAVING clause allows to filter on aggregated values.
GROUP_CONCAT
– GROUP_CONCAT([DISTINCT] expr1 [ORDER BY expr2] [SEPARATOR string])
– Aggregate function to concatenate substrings within a group

SELECT
PRODUCT_CLASS,
group_concat(PRODUCT_GROUP_DESC) AS PRODUCT_LIST
FROM ARTICLE
GROUP BY PRODUCT_CLASS;

PRODUCT_CLASS PRODUCT_LIST
1 Alcohol,Alcohol,Alcohol,Alcohol,Alcohol,Alcohol,Al...
2 Household Items,Household Items,Household Items,Ho...

28 Exasol SQL ACADEMY

GROUP_CONCAT

Exasol provides you with a special aggregate function for string values: GROUP_CONCAT.
With its help you can concatenate all values of the group to a single string. If no separator
is specified, a comma ',' will be used. The ORDER BY clause is optional.
GROUP_CONCAT supports also a DISTINCT option. Please note, that only consecutive
identical values will be treated as a single one.
GROUPING SETS
– Different aggregation levels in one single query
– Full support of GROUPING SETS (…), CUBE and ROLLUP
SELECT year(SALES_DATE) AS SALES_YEAR,
month(SALES_DATE) AS SALES_MONTH,
sum(PRICE) AS SUM_PRICE,
grouping(local.SALES_YEAR, local.SALES_MONTH) AS grp
FROM SALES
GROUP BY ROLLUP(local.SALES_YEAR, local.SALES_MONTH)
ORDER BY SALES_YEAR, SALES_MONTH NULLS FIRST;

– Levels identified by NULL-Values or GROUPING function


SALES_YEAR SALES_MONTH SUM_PRICE GRP
2014 14838600569.06 1
2014 5 1598826255.70 0
2015 3578247392.34 1
2015 2 1375016606.70 0
18416847961.40 3
… … … …

30 Exasol SQL ACADEMY

GROUPING SETS

Different aggregation levels within the same query are possible with GROUPING SETS, with
full support of CUBE and ROLLUP.
COUNT DISTINCT: The problem

– Very expensive operation

– Example: counting articles sold per market:


– Execution time for COUNT DISTINCT:
– ~18 secs on 3.3B records

WITH SA_SP AS
(
...
)
SELECT MARKET_ID, count(DISTINCT ARTICLE_DESCRIPTION)
FROM SA_SP
GROUP BY MARKET_ID;

31 Exasol SQL ACADEMY

COUNT DISTINCT: The problem

COUNT DISTINCT is a very expensive operation, since it typically cannot utilize all CPU
resources
Approximate COUNT DISTINCT

– Often no need for precise figures delivered by COUNT DISTINCT


– APPROXIMATE_COUNT_DISTINCT is less expensive
– Using HyperLogLog algorithm
– https://en.wikipedia.org/wiki/HyperLogLog
– Same example with APPROXIMATE_COUNT_DISTINCT:
– ~10 secs on 3.3B records (was ~18 secs before)

WITH SA_SP AS
(
...
)
SELECT MARKET_ID, approximate_count_distinct(ARTICLE_DESCRIPTION)
FROM SA_SP
GROUP BY MARKET_ID;

32 Exasol SQL ACADEMY

Approximate COUNT DISTINCT

The precise figures delivered by the expensive COUNT DISTINCT operation are
often not required.
Exasol offers the less expensive APPROXIMATE_COUNT_DISTINCT function for these
cases.
Analytical Functions

– Compute an aggregate value based on a window of rows


– Return one value for every row
– Groups (partitions) are defined by the PARTITION BY clause.
– No PARTITION BY clause: one single partition => the window contains all rows
– The ORDER BY clause defines the sort order within each partition
– No ORDER BY clause: the window contains all rows (within a partition)
– Using ORDER BY: the window contains all rows from the first one to the current row based on the sort order

SELECT MARKET_ID, SALES_TIMESTAMP, PRICE


sum(PRICE)
OVER(PARTITION BY MARKET_ID ORDER BY SALES_TIMESTAMP)
FROM SALES;

33 Exasol SQL ACADEMY

Analytical Functions

Analytical functions are designed to address problems similar to the following:


•Calculate a running total
•Find percentiles within a group
•Top-N queries
•Compute a moving average
Analytical functions compute an aggregate value based on a group of rows. They differ
from aggregate functions in that they do not condense every group to one line, but return
a value for every row. Regarding order of execution, analytical functions are the last set of
operations performed in a query except for the final ORDER BY clause. All joins and all
WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are
processed. Therefore, analytic functions can appear only in the select list or ORDER BY
clause.
With the help of PARTITION BY you can divide the table into multiple partitions, based on
the specified criteria. As a result, values within each partition will be calculated
independently from the rest of the table, similar to the GROUP BY clause. If no PARTITION
BY clause is stated, an analytical function always refers to the entire table. The ORDER BY
clause specifies the sort order of the rows within each partition. Note that this can differ
from the output sort order. If this clause is specified, not all the rows of a partition will be
used for the result computation, but only a part of them, a so called "window". By default,
this window consists of all the rows of the respective partition up to the current row
regarding the given sort criteria ("ROWS BETWEEN UNBOUND PRECEEDING AND CURRENT
ROW").
User-defined Functions

– Self-created scalar functions


– Oracle-compatible syntax

– Within functions it is possible to:


– Define and use variables
– Use other functions (user-defined or built-in)
– Use control structures (loops, branches)
– Use scalar sub-queries

35 Exasol SQL ACADEMY

User-defined Functions

In Exasol, you can create scalar functions yourself. The syntax is Oracle-compatible.
Within functions, you can define and use variables. The normal SQL data types are valid for
variable and parameter declarations. Any scalar SQL expressions can be used for
expressions, e.g. all built-in functions are available. You can also use scalar sub-queries,
which cannot contain any parameter.
a:=(select max(city_id) from cities);
Please note that a function will be executed for each row, thus it is not recommended to
use scalar sub-queries within functions.
Additionally, control stuctures such as loops or braches can be used in functions:
-- assignment
res := CASE WHEN input_variable<0 THEN 0 ELSE input_variable END;
-- if-branch
IF input_variable = 0 THEN
res := NULL;
ELSE
res := input_variable;
END IF;
-- for loop
FOR cnt:=1 TO input_variable
DO
res:=res*2;
END FOR;
-- while loop
WHILE cnt<=input_variable
DO
res:=res*2;
cnt:=cnt+1;
END WHILE;
User-defined functions: Example
CREATE OR REPLACE FUNCTION distance
(lat1 numeric(9,6), long1 numeric(9,6), lat2 numeric(9,6), long2 numeric(9,6))
RETURNS numeric(9,4)
IS res numeric(9,4);
BEGIN
res:=acos(sin(lat1/180*pi())*sin(lat2/180*pi()) +
cos(lat1/180*pi())*cos(lat2/180*pi())*
cos(long1/180*pi()-long2/180*pi()));
res:=cast(res*6378.137 as numeric(9,4));
RETURN res;
END distance
/
SELECT c1.CITY_NAME AS CITY_NAME1,
c2.CITY_NAME AS CITY_NAME2,
distance(c1.LAT, c1.LON, CITY_NAME1 CITY_NAME2 DISTANCE
c2.LAT, c2.LON)
AS DISTANCE Berlin Leipzig 148.6106
FROM CITIES c1
Berlin Hamburg 255.1255
JOIN CITIES c2
ON c1.CITY_NAME < c2.CITY_NAME Hamburg Leipzig 295.9456
;

36 Exasol SQL ACADEMY

User-defined functions: Example

Above shows an example how to create a user-defined function (first


statement) and later use it in a SELECT (second statement).
UDF Scripts
– Implement your own scalar, aggregate, analytical or generation functions

– Built-in languages: Java, R, Python, Lua


– Languages extendable using containers

– Usage of well-known libraries


– Development using Exasol interfaces (Java) or SDKs (Python, R)

37 Exasol SQL ACADEMY

UDF Scripts

UDF scripts can be used to implement your own scalar, aggregate, analytical or generation
function

Built-in languages: Java, R, Python, Lua


It is possible to extend the languages using containers

Usage of well-known libraries


Development using Exasol interfaces (Java) or SDKs (Python, R)
UDF Scripts: Example
– Summing up to values
– Scalar function
– Language Lua

CREATE LUA SCALAR SCRIPT myadd(i int, j int) RETURNS int AS


function run(ctx)
return ctx.i + ctx.j
end
/

– More information in the Advanced Analytics course


– Best practices, SDK usage, …

38 Exasol SQL ACADEMY

UDF Scripts: Example

Above example shows a UDF script written in Lua that creates a scalar function to add two
input values.
More details about UDF scripts are given in the Exasol Academy course Advanced Analytics
LIMIT clause
– Way to limit a result set
– Result is not deterministic

SELECT *
FROM SALES
LIMIT 10;

– LIMIT clause may be specified with an offset

SELECT *
FROM SALES
ORDER BY SALES_ID
LIMIT 10 OFFSET 10;

39 Exasol SQL ACADEMY

LIMIT clause

In Exasol, it is possible to limit a result set (without having to specify an ORDER BY


clause). The result of such a query is not deterministic, though.
The LIMIT clause can be combined with the OFFSET clause
Row numbering

– ROWNUM
– An (arbitrary) row number between 1 and n based on internal storage
– n is the number of rows in the table or subselect

– ROWID
– A unique address of a row inside a table
– The address is reassigned on any DML

– ROW_NUMBER()
– Analytical function which uniquely numbers the rows according to given sort criteria and partitioning

40 Exasol SQL ACADEMY

ROWNUM
ROWNUM is a pseudo column which numbers the records of a table or subselect, beginning
with 1. Has certain restrictions on usage.
ROWID
Every row of a table in the database has a unique address, the so-called ROWID
(DECIMAL(36,0) data type). The ROWIDs of a table are managed by the DBMS. This ensures
that the ROWIDs within a table are distinct – in contrast, it is quite acceptable for ROWIDs
of different tables to be the same. Using DML statements such as
INSERT, UPDATE, DELETE, TRUNCATE or MERGE, all the ROWIDs of the relevant tables are
invalidated and reassigned by the DBMS. In contrast to that, structural table changes such
as ALTER TABLE ADD COLUMN, will leave the ROWIDs unchanged. The ROWID pseudo
column is only valid for real tables, not for views or subselects.
An example of using ROWIDs would be the deletion of specific rows in a table, e.g. in order
to restore the UNIQUE property of a compound key where no other criterion is available to
distinguish between rows.
DENSE_RANK, RANK and ROW_NUMBER
These functions have no arguments, the ORDER BY clause is required. They return the rank
or row number within the partition, with the ORDER BY clause determining the ranking or
numbering.
For equal values in the sort expression, both RANK and DENSE_RANK return a common rank
for the affected rows, however DENSE_RANK doesn't skip following values as in case with
RANK. ROW_NUMBER returns unique numbers in any case, equal values will receive a
random ordering.
Row numbering: ROWNUM

– ROWNUM

SELECT SALES_ID, ROWNUM


FROM SALES
WHERE ROWNUM < 10; SALES_ID ROWNUM
389577429 1
321740964 2
389577438 3
253762032 4
321740973 5
389577447 6
458941123 7
389577513 8
389577522 9

41 Exasol SQL ACADEMY

Row numbering: ROWNUM

Above shows an example how to use ROWNUM


Row numbering: ROWID

– ROWID
SELECT SALES_ID, ROWID
FROM SALES SALES_ID ROWID
LIMIT 10; 389577429 321098139506691362218450234437009408
321740964 321098139506691362218450234437009409
389577438 321098139506691362218450234437009410
253762032 321098139506691362218450234437009411
321740973 321098139506691362218450234437009412
320831004 321098139506691362218731709413720064
252015775 321098139506691362218731709413720065
388667398 321098139506691362218731709413720066
388667407 321098139506691362218731709413720067

42 Exasol SQL ACADEMY

Row numbering: ROWID

Above shows an example how to use ROWID


Row numbering: ROW_NUMBER()

– ROW_NUMBER
SELECT * FROM
( SELECT SALES_ID, PRICE
row_number() over (ORDER BY PRICE DESC) AS MY_ROWNUM
FROM SALES
)
WHERE MY_ROWNUM <= 5;

SALES_ID PRICE MY_ROWNUM


243303093 625.41 1
388393990 622.89 2
26495229 621.33 3
161127730 601.47 4
35297065 587.98 5

44 Exasol SQL ACADEMY

Row numbering: ROW_NUMBER()

Above shows an example how to use the ROW_NUMBER function.


Constraint checks

– PRIMARY KEY check CITY_ID CITY_NAME


ZIP_
CODE
SELECT CITY_ID, CITY_NAME, ZIP_CODE 10729 Hemmerde 59427
WITH INVALID PRIMARY KEY( CITY_ID )
FROM CITIES; Hannover 30519
10729 Unna 59427

– UNIQUE check 10729 Unna 59427

SELECT CITY_ID, CITY_NAME, ZIP_CODE ZIP_


WITH INVALID UNIQUE( CITY_ID, CITY_NAME, ZIP_CODE ) CITY_ID CITY_NAME
CODE
FROM CITIES;
10729 Unna 59427

– FOREIGN KEY check 10729 Unna 59427

SELECT MARKET_ID
WITH INVALID FOREIGN KEY( CITY_ID )
MARKET_ID
FROM MARKETS 1803
REFERENCING CITIES( CITY_ID );
1802

45 Exasol SQL ACADEMY

Constraint checks

These queries show which rows would violate a constraint if that constraint would be
enabled.
The statements do not create a constraint, though.

Verification of the primary key property (PRIMARY KEY)


This construct can be used to check whether a number of columns have the primary key
property. This is the case if the specified columns do not contain data records in duplicate
and no NULL values are evident. Rows that do not conform to the primary key property are
selected.
Verification of the uniqueness (UNIQUE)
This construct can be used to verify whether the rows of a number of columns cols are
unique. This is the case if the specified columns cols do not contain data records in
duplicate. Rows in the specified columns cols which only contain NULL values are classified
as being unique (even if there is more than one). Non-unique rows are selected.
Verification of the foreign key property (FOREIGN KEY)
This construct can be used to check whether a number of columns in a table (TABLE)
possess the foreign key
property (FOREIGN KEY) in relation to another table (REF_TABLE). This is the case if the
specified columns do not contain NULL values and the column value of each row from table
also exists as a row in the specified columns of the referenced table (REF_TABLE). Rows
that violate the foreign key property are selected.
MERGE

– UPDATE, INSERT and DELETE in one single statement


– Integrates data from source table into target table

MERGE INTO CITIES c


USING NEW_CITIES n
ON (c.CITY_ID = n.CITY_ID)
WHEN MATCHED THEN UPDATE SET c.AREA = n.AREA,
c.AREA_SHORT = n.AREA_SHORT
DELETE WHERE TODO = 'DELETE'
WHEN NOT MATCHED THEN INSERT VALUES
(n.CITY_ID, n.COUNTRY_CODE, n.ZIP_CODE, n.CITY_NAME,
n.DISTRICT, n.AREA, n.AREA_SHORT, n.LAT, n.LON)
;

46 Exasol SQL ACADEMY

MERGE

This statement combines UPDATE, DELETE and INSERT and is a powerful method for data
manipulation, especially within ETL tasks.
The ON condition describes the correlation between the two tables (similar to a join). The
MATCHED clause is used for matching row pairs, the NOT MATCHED clause is used for those
where no match is found. Only equivalence conditions (=) are permitted in the ON
condition.
UPDATE clause: the optional WHERE condition specifies the circumstances under which
the UPDATE is conducted, whereby it is permissible for both the target table and the
source table to be referenced for this.
With the aid of the optional DELETE condition it is possible to delete rows in the target
table. Only rows that have been changed are taken into account and only values after the
UPDATE are available for conditions.
DELETE clause: the optional WHERE condition specifies the circumstances under which the
DELETE is conducted.
INSERT clause: the optional WHERE condition specifies the circumstances under which the
INSERT is conducted. In this respect, it is only permissible to reference the columns of the
source table.
Notes:
•The source table can be a physical table, a view or a subquery.
•The UPDATE or DELETE and INSERT clauses are optional with the restriction that at least
one must be specified. The order of the clauses can be exchanged.
•If there are several entries in the change table that could apply to an UPDATE of a single
row in the target table, this leads to the error message "Unable to get a stable set of rows
in the source tables" if the original value of the target table would be changed by the
UPDATE candidates.
•An update of columns used in the ON-condition is not allowed.
Comments on Database Objects

– Example: Comment on a table and columns

COMMENT ON TABLE SALES IS 'All sales';


COMMENT ON COLUMN SALES.SALES_ID IS 'Sales ID';
COMMENT ON COLUMN SALES.SALES_DATE IS 'Date of sales';
COMMENT ON COLUMN SALES.PRICE IS 'Sum of sales';

DESCRIBE FULL SALES;

COLUMN_NAME … COLUMN_COMMENT
SALES_ID … Sales ID
SALES_DATE … Date of sales
PRICE … Sum of sales
… … …

47 Exasol SQL ACADEMY

Comments on Database Objects

Comments may be given to all database objects: User


• Roles
• Connections
• Schemas
• Tables
• Columns
• Views
• Functions
• Scripts

It is allowed to assign comments within a specific command (“COMMENT ON … IS”, all


objects except views) or while creating an object (tables and views). Comments on views
and their columns can be given while creating the view, only.

You might also like