02 Modul Exasol SQL - en

ACADEMY
Exasol SQL
Exasol SQL
– Exasol supports the major part of the SQL Standard 2008

– Some features are not (yet) supported, e.g. CURSORS
– Some extensions
DQL
SELECT
UNION [ALL]
DCL INTERSECT / MINUS
CREATE USER
CREATE ROLE
GRANT PRIVILEGES
DDL
CREATE SCHEMA
DML CREATE TABLE
INSERT CREATE VIEW
UPDATE
DELETE/TRUNCATE
MERGE
IMPORT/EXPORT
2 Exasol SQL ACADEMY
Exasol SQL
Database definition (DDL)

The structure of the database is defined by means of the Data Definition Language (DDL).
This concerns, in particular, the creation of schemas and all of the schema objects
contained therein such as tables, views, functions and scripts. It also includes constraints
on the tables, but content of tables is not defined via DDL.
Access control (DCL)
The SQL statements of the Data Control Language (DCL) are used to control the database
access rights. Through the management of users and roles as well as the granting of
privileges, it can be determined who is permitted to perform what actions in the database.
Manipulation of the database(DML)
The content of tables can be changed using the Data Manipulation Language (DML).
Important statements are:
INSERT Inserting constant values and subquery results
UPDATE The table‘s content can also be updated. You can qualify the updatable rows by a
WHERE condition.
DELETE Deletes certain rows of a table.
In case of an error during the update process, no ROLLBACK is executed.
TRUNCATE Deletes the whole content of a table. In contrary to e.g. Oracle only the
content is deleted – the privileges are not concerned.
MERGE Unifies the three commands UPDATE, DELETE and INSERT. See above.
Query language (DQL)
By the use of the Data Query Language (DQL) you can analyze the database (SELECT
statement).
Important constructs are:
Table operators UNION [ALL], INTERSECT, MINUS
Identifiers
– Each object in the database has a unique name
Database
Root-
User Role Connection
Catalog
Schema
Table View Function Script
Constraints Columns
– These names are referenced via SQL identifiers

– SQL identifiers: 128 characters
– Different handling of regular and delimited identifiers
Identifiers
Every database object has its own (local) unique name.

Within SQL commands, those names are referenced via SQL identifiers.
SQL identifiers are limited to 128 characters.
The handling of regular and delimited identifiers is different.
Regular identifiers
– Defined without quotes

– Case-insensitive
– Must not contain the following ASCII characters:
"'+-*/<>={}[]().,;:?|&^%!
– Stored in the database in upper case:

identifier name in database
ABC ABC
aBc ABC
a123 A123
CREATE TABLE aBc (…);
SELECT * FROM abc;
Regular identifiers
are stated without quotation marks. They must start with a letter (unicode classes Lu, Ll,
Lt Lu, Ll, Lt, Lm, Lo and Nl). For the further symbols, characters from the unicode classes
(Mn, Mc, Nd, Pc, Cf) are additionally allowed. This is SQL standard compliant.
For German-speaking users this means that also umlauts are allowed as part of regular
identifiers.
A further restriction besides the character set is that reserved words cannot be used as a
regular identifier. If you want to use characters which are prohibited for regular
identifiers, you can use delimited identifiers (see next section). E.g. if you want to use the
word table as identifier, you have to quote it ("TABLE"), since it is a reserved keyword.
Regular identifiers are always stored in the database in upper case. Therefore, they are
not case sensitive in SQL text. As shown in the above example, the two identifiers (ABC)
and (aBc) are identical.
Delimited identifiers
– Enclosed in double quotation marks: "abc"

– Case-sensitive
– May contain any characters, except the dot('.')
– Exception: For users and roles, the dot is allowed (e.g. email addresses)
– Can be reserved keywords (except "ROWNUM")
– Stored case-sensitive in the database
– Exception: User and roles are always case-insensitive (upper-case)
Identifier Name in database

"ABC" ABC
"abc" abc
"_x_" _x_
"ab""c" ab"c CREATE TABLE "abc" (…);
SELECT * FROM "abc";
Delimited identifiers
These identifiers are names enclosed in double quotation marks. Any character can be
contained within the quotation marks except the dot ('.'). If you want to use a quotation
mark in the name, it must be doubled (e.g. "ab""c" indicates the name ab"c).
Excepting users, roles and passwords, identifiers in quotation marks are always stored case
sensitive in the database.
Reserved keywords
– Cannot be used as regular identifiers
– Usually part of the SQL gramma
– List of reserved words in EXA_SQL_KEYWORDS
– Examples:
"SELECT" "YES" "USER" "DATE" "TIME" …
– EXAplus: Syntax-Highlighting for reserved keywords
SELECT 14 AS "IDENTITY";
SELECT current_user AS "USER";
Reserved keywords
There is a number of reserved words in Exasol, which cannot be used as regular identifiers.
For example, the keyword 'SELECT' is a reserved word. If a table has to be created with
this name, it will only be possible if the name will be put in double quotation marks.
"SELECT" as a table name differs however from table names such as "Select" or "seLect".
The list of reserved words can be found in the EXA_SQL_KEYWORDS system table:
SELECT * FROM exa_sql_keywords WHERE reserved=true
Exasol Data Types
BOOLEAN
CHAR(n) n in [1;2,000]
DATE
DECIMAL(p,s) p in [1;36], s in [0;p]
DOUBLE PRECISION
GEOMETRY[(srid)] srid defines the spatial reference system
INTERVAL DAY [(p)] TO SECOND [(fp)] p in [1;9], fp in [0;9] accurate to a millisecond
INTERVAL YEAR [(p)] TO MONTH p in [1;9]

TIMESTAMP Accurate to a millisecond
TIMESTAMP WITH LOCAL TIME ZONE Aware of the session time zone
VARCHAR(n) n in [1;2,000,000]
– Several aliases: DOUBLE = DOUBLE PRECISION, INT = DEC(18,0), …
Exasol Data Types
Common data types are supported:
BOOLEAN
CHAR
VARCHAR
DECIMAL
DOUBLE PRECISION
DATE
INTERVAL
TIMESTAMP
Notice that DATE doesn’t include hours, minutes and seconds in Exasol.
Not supported types:

Blob
Binary (GEOMETRY)
Clob (use VARCHAR)
Data Type: String
– CHAR(n [CHAR]) [[CHARACTER SET] encoding]

– Encoding: UTF-8 or ASCII (7 bit)
– Maximal length: 2,000 characters
– Fixed-sized and padded with whitespace
– VARCHAR(n [CHAR]) [[CHARACTER SET] encoding]

– Encoding: UTF-8 or ASCII (7 bit)
– Maximal length: 2 million characters
– Variable-sized, depending on actual string length: 'a' is different to 'a '
An empty string ('') is translated to a NULL value

–- string concatenation
String literals are enclosed in single quotes SELECT 'hello'||'!';
String comparison is always case-sensitive!
Data Type: String
The two string types can store up to 2 million characters, either in ASCII or UTF8 encoding.
In the latter one those 2 million characters can consume up to 8 million bytes, while the
first one can not contain any country-specific character.
The conversion into the (designated) output encoding of the client is done by the
corresponding driver.
The database can store and process all existing characters of a certain character set,
independent to the ability of the client displaying the characters. For example, EXAplus
converts linefeeds (\n) into spaces to preserve tabular output formatting.
Strings literals are automatically interpreted with the best fitting data type. That‘s why an
explicit cast is usually not necessary.
Any data type can be converted to a string by the use of the function TO_CHAR. For the
other way round you can use the functions TO_DATE, TO_TIMESTAMP and TO_NUMBER.
The optional parameter 'format' defines how the data should be interpreted. More details
on format strings can be found in the User Manual, Section 2.5 „Format Models“.
The length of a string can be determined via the LENGTH function family (either the
number of characters, the number of bytes or the number of bits).
Data Type: DATE
– DATE:
– Consists of (year, month, day).
– Valid range is '0001-01-01' to ' 9999-12-31'.
– Either a completely valid date or NULL.
– Values like '2009-01-00' or '2009-02-30' are not possible!
– Current date:
– SYSDATE (database time zone)
– CURRENT_DATE (session time zone)
Data Type: DATE
The DATE data type stores a date including day, month and year (4 digits). Only valid date
values are allowed in the range of 0000-01-01 to 9999-12-31.
It‘s not possible to omit some fields (day/month) or set them to invalid values (0, >12/31).
Therefore a date is either completely valid or a NULL value. Values like '2009-01-00' or
'2009-02-30' are not allowed and lead to an 'invalid date value' data exception.
Data Type: TIMESTAMP
– TIMESTAMP:
– Consists of (Date, Time accurate to a millisecond).
– Time from '00:00:00' to '23:59:59.999'
– Current timestamp:
– SYSTIMESTAMP (database time zone)
– CURRENT_TIMESTAMP, LOCALTIMESTAMP, now() (session time zone)
– TIMESTAMP WITH LOCAL TIME ZONE:

– Internal normalization to UTC timestamp
– For any in- or output, the UTC timestamp is implicitly converted to a timestamp in the session time zone
– This may lead to different results for the same query within different sessions:
SELECT …
INSERT … VALUES
('2018-01-05 20:15:00.000',…) 2018-01-06 04:15:00.000
Data Type: TIMESTAMP
The TIMESTAMP data type is similar to the DATE data type, but contains also a time (hours,
minutes, seconds, milliseconds). The valid range is [0000-01-01 00:00:00.000; 9999-12-31
23:59:59.999]. Invalid values lead to an 'invalid time value' data exception.
Datetime literals are created by specifying the corresponding keyword (date/timestamp),
followed by a string with the corresponding value (ISO format YYYY-MM-DD HH:MI:SS.FF3).
NULL values can be inserted by using the empty string (timestamp '').
WHERE x = date '2009-01-16'
WHERE y = timestamp '2009-03-12 12:37:23.003'
Normal strings are if needed automatically converted into the corresponding data type.
Please consider that this conversion uses the current datetime formats for displaying and
interpreting datetime strings (NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT).
WHERE x = '2009-01-16'
WHERE x = cast('16.01.2009' as timestamp)
If the existing format does not match with that format, you can specify an own format
when using the conversion functions TO_DATE and TO_TIMESTAMP.
WHERE x = to_date('16.01.2009', 'DD.MM.YYYY')
Regular subselect
– Example:
SELECT * FROM (
SELECT ma.*, ci.CITY_NAME AS CITY_NAME
FROM MARKETS ma JOIN CITIES ci
ON ma.CITY_ID = ci.CITY_ID
WHERE ci.CITY_NAME LIKE 'Neu%'
)
WHERE POPULATION > 10000;
Regular subselect
Subselects can be used like tables in SQL statements as shown above.

Common Table Expression (CTE):
– Example:
WITH CUSTOMERS_IN_CITIES_STARTING_WITH_NEU AS
(
SELECT ma.*, ci.CITY_NAME AS CITY_NAME
FROM MARKETS ma JOIN CITIES ci
ON ma.CITY_ID = ci.CITY_ID
WHERE ci.CITY_NAME LIKE 'Neu%'
)
SELECT * FROM CUSTOMERS_IN_CITIES_STARTING_WITH_NEU
WHERE POPULATION > 10000;
Common Table Expression (CTE)
If a subquery is used multiple times within the same query, the usage of CTEs is
recommended.
They can be used for the combination of iterative queries, too:

with a as (select …),
b as (select … from a),
c as (select … from b, a),
…
select * from z;
Correlated subselect
– Example: (all markets from cities with > 5 markets)

– Main usage: IN or EXISTS
WITH cte1 AS (
SELECT count(1) AS MARKET_COUNT, CITY_ID
FROM MARKETS
GROUP BY CITY_ID
HAVING count(1) > 5
)
SELECT ma.*
FROM MARKETS ma WHERE EXISTS
( SELECT 1 FROM cte1
WHERE cte1.CITY_ID = ma.CITY_ID
);
Correlated subselect
The CTE in the example above is executed for each iteration of the outer query here.
Aliases: The problem
– Aliases cannot be used within the same subselect for:

– GROUP BY
– WHERE
– CASE WHEN
– HAVING

FROM MARKETS
GROUP BY CITY_ID
HAVING MARKET_COUNT > 5
;
> [42000] object MARKET_COUNT not found [line 4, column 8]
Aliases: The problem
According to the SQL-Standard, aliases cannot be used within the same subselect for:
GROUP BY
WHERE
CASE WHEN
HAVING
Aliases: Solution with LOCAL
– Keyword 'local’ allows to use aliases

FROM MARKETS
GROUP BY CITY_ID
HAVING local.MARKET_COUNT > 5
;
MARKET_COUNT CITY_ID
6 9661
10 9671
6 4897
10 9652
8 9672
Aliases: Solution with LOCAL
Exasol provides the keyword 'local' to reference aliases as shown in above example.
CURRENT_USER vs. SCOPE_USER
– CURRENT_USER shows the currently connected user
SELECT CURRENT_USER;
CURRENT_USER
JIM
– SCOPE_USER shows the owner of the current object

– new in Exasol Version 6.1
ALTER SCHEMA X CHANGE OWNER BARBARA;
CREATE VIEW X.USER_TEST AS SELECT CURRENT_USER, SCOPE_USER;
SELECT * FROM USER_TEST;
CURRENT_USER SCOPE_USER
JIM BARBARA
CURRENT_USER vs. SCOPE_USER
CURRENT_USER shows the username of the currently connected user

SCOPE_USER (new in Exasol Version 6.1) is the owner of the current object (for views and
UDF scripts):
Scalar Functions
– Computed for each row of the input table

– Available for several input types:
– Numeric SELECT power (2,10);

➢ 1024
– Strings SELECT substr('abcdefg', 2, 5);

➢ 'bcdef'
– Date / Timestamp SELECT year(sysdate) AS current_year;

➢ 2016
SELECT st_distance('POINT(0 0)', 'POINT(3 4)')

– Geometry
➢ 5
– …
Scalar Functions
Scalar functions are computed separately for each row of the input table
Those functions are available for several input types as shown above.
Explicit Conversion between Date Types
SELECT TO_CHAR(current_date, 'DD. MONTH YYYY', 'NLS_DATE_LANGUAGE=GERMAN');

SELECT TO_DATE('08222016', 'MMDDYYYY');
SELECT TO_TIMESTAMP('08222016 18:36:55', 'MMDDYYYY HH:MI:SS');
-- Format and language optional, default is NLS_DATE_FORMAT,
-- NLS_TIMESTAMP_FORMAT, NLS_DATE_LANGUAGE (from session parameter)
Special format elements
YYYY WW IYYY IW
2008-12-31 2008 53 2009 01
2009-01-01 2009 01 2009 01
2005-12-31 2005 53 2005 52
2006-01-01 2006 01 2005 52
Explicit Conversion between Date Types
Datetime data types can be explicitly converted among each other and into string data
types. A conversion from and to UNIX timestamp can be done by its definition (seconds
since 01.01.1970) and the use of functions SECONDS_BETWEEN and ADD_SECONDS.
One specialty exist for format elements
'WW' for week
'IW' for ISO week
'IYYY' for ISO year
The difference is, that the WW format counts days starting from 1th of January, while the
IW format always counts from Monday to Sunday (the first week of a year has to have at
least 4 days). That means that the first or last days of a year can be assigned to the last or
new year.
Using CAST
– Can be used to transform values into any other format:
SELECT cast(42 AS char(10)) char_col;

➢ '42 '
Using CAST
Additionally to the standard conversion functions, CAST be be used to flexibly transform

values into any other required format.
Regular Expressions: REGEXP_INSTR
– REGEXP_INSTR, REGEXP_SUBSTR and REGEXP_REPLACE offer regular expressions

– Example: Is there a bracket in a string?
SELECT
DISTINCT PRODUCT_GROUP_DESC,
CASE REGEXP_INSTR(PRODUCT_GROUP_DESC, '[()]')
WHEN 0 THEN false ELSE true
END AS HAS_BRACKET
FROM ARTICLE;
PRODUCT_GROUP_DESC HAS_BRACKET
Frozen Foods false
Drinks (returnable bottles) true
Regular Expressions: REGEXP_INSTR
The functions INSTR, SUBSTR and REPLACE are only used with constant strings.
But patterns can be passed to functions using regular expression.
Exasol supports Perl Compatible Regular Expressions.
Above shows an example for REGEXP_INSTR() usage.

Regular Expressions: REGEXP_REPLACE
– Example: Replace all vowels
SELECT
DISTINCT PRODUCT_GROUP_DESC,
REGEXP_REPLACE(PRODUCT_GROUP_DESC, '[aeiou]', '#')
AS REPLACE_VOWELS
FROM ARTICLE;
PRODUCT_GROUP_DESC REPLACE_VOWELS
Frozen Foods Fr#z#n F##ds

Drinks (returnable bottles) Dr#nks (r#t#rn#bl# b#ttl#s)
Regular Expressions: REGEXP_REPLACE
Above example replaces all vowels with the # character

Regular Expressions: REGEXP_SUBSTR
– Example: Extraction of email address
SELECT REGEXP_SUBSTR
('My mail address is my_mail@yahoo.com',
'(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}')
AS EMAIL;
EMAIL
my_mail@yahoo.com
Regular Expressions: REGEXP_SUBSTR
Above example shows how an email address can be extracted from a string field.
Aggregate Functions
– Return a single value for each group

SELECT ARTICLE_ID, sum(AMOUNT*PRICE) AS ARTICLE_SUM
FROM SALES_POSITION
GROUP BY ARTICLE_ID;
– Without GROUP BY the whole table is treated as one single group:
SELECT max(BASE_SALES_PRICE), avg(BASE_SALES_PRICE)

FROM ARTICLE;
Aggregate Functions
A GROUP BY clause defines disjoint subsets (groups) of rows in a table.

For each group, the aggregate function returns a single value.
If the GROUP BY clause is omitted and at least one aggregate function is used, the whole
table is treated as one single group.
In this case only aggregations and constants are allowed.
Aggregate Functions continued
– Numeric values supported in GROUP BY clause:

SELECT ARTICLE_ID, 'total:', sum(AMOUNT*PRICE) ARTICLE_SUM
FROM SALES_POSITION
GROUP BY 1;
– HAVING clause to filter on aggregated values:
SELECT ARTICLE_ID, sum(AMOUNT*PRICE) ARTICLE_SUM

FROM SALES_POSITION
GROUP BY ARTICLE_ID
HAVING local.ARTICLE_SUM > 2000000;
Aggregate Functions continued
Constants used in the SELECT list do not need to appear in the GROUP BY clause. Numeric
values will be interpreted as column numbers of the result set.
Using the HAVING clause allows to filter on aggregated values.
GROUP_CONCAT
– GROUP_CONCAT([DISTINCT] expr1 [ORDER BY expr2] [SEPARATOR string])
– Aggregate function to concatenate substrings within a group
SELECT
PRODUCT_CLASS,
group_concat(PRODUCT_GROUP_DESC) AS PRODUCT_LIST
FROM ARTICLE
GROUP BY PRODUCT_CLASS;
PRODUCT_CLASS PRODUCT_LIST
1 Alcohol,Alcohol,Alcohol,Alcohol,Alcohol,Alcohol,Al...
2 Household Items,Household Items,Household Items,Ho...
GROUP_CONCAT
Exasol provides you with a special aggregate function for string values: GROUP_CONCAT.
With its help you can concatenate all values of the group to a single string. If no separator
is specified, a comma ',' will be used. The ORDER BY clause is optional.
GROUP_CONCAT supports also a DISTINCT option. Please note, that only consecutive
identical values will be treated as a single one.
GROUPING SETS
– Different aggregation levels in one single query
– Full support of GROUPING SETS (…), CUBE and ROLLUP
SELECT year(SALES_DATE) AS SALES_YEAR,
month(SALES_DATE) AS SALES_MONTH,
sum(PRICE) AS SUM_PRICE,
grouping(local.SALES_YEAR, local.SALES_MONTH) AS grp
FROM SALES
GROUP BY ROLLUP(local.SALES_YEAR, local.SALES_MONTH)
ORDER BY SALES_YEAR, SALES_MONTH NULLS FIRST;
– Levels identified by NULL-Values or GROUPING function

SALES_YEAR SALES_MONTH SUM_PRICE GRP
2014 14838600569.06 1
2014 5 1598826255.70 0
2015 3578247392.34 1
2015 2 1375016606.70 0
18416847961.40 3
… … … …
GROUPING SETS
Different aggregation levels within the same query are possible with GROUPING SETS, with
full support of CUBE and ROLLUP.
COUNT DISTINCT: The problem
– Very expensive operation
– Example: counting articles sold per market:

– Execution time for COUNT DISTINCT:
– ~18 secs on 3.3B records
WITH SA_SP AS
(
...
)
SELECT MARKET_ID, count(DISTINCT ARTICLE_DESCRIPTION)
FROM SA_SP
GROUP BY MARKET_ID;
COUNT DISTINCT: The problem
COUNT DISTINCT is a very expensive operation, since it typically cannot utilize all CPU
resources
Approximate COUNT DISTINCT
– Often no need for precise figures delivered by COUNT DISTINCT

– APPROXIMATE_COUNT_DISTINCT is less expensive
– Using HyperLogLog algorithm
– https://en.wikipedia.org/wiki/HyperLogLog
– Same example with APPROXIMATE_COUNT_DISTINCT:
– ~10 secs on 3.3B records (was ~18 secs before)
WITH SA_SP AS
(
...
)
SELECT MARKET_ID, approximate_count_distinct(ARTICLE_DESCRIPTION)
FROM SA_SP
GROUP BY MARKET_ID;
Approximate COUNT DISTINCT
The precise figures delivered by the expensive COUNT DISTINCT operation are
often not required.
Exasol offers the less expensive APPROXIMATE_COUNT_DISTINCT function for these
cases.
Analytical Functions
– Compute an aggregate value based on a window of rows

– Return one value for every row
– Groups (partitions) are defined by the PARTITION BY clause.
– No PARTITION BY clause: one single partition => the window contains all rows
– The ORDER BY clause defines the sort order within each partition
– No ORDER BY clause: the window contains all rows (within a partition)
– Using ORDER BY: the window contains all rows from the first one to the current row based on the sort order
SELECT MARKET_ID, SALES_TIMESTAMP, PRICE

sum(PRICE)
OVER(PARTITION BY MARKET_ID ORDER BY SALES_TIMESTAMP)
FROM SALES;
Analytical Functions
Analytical functions are designed to address problems similar to the following:

•Calculate a running total
•Find percentiles within a group
•Top-N queries
•Compute a moving average
Analytical functions compute an aggregate value based on a group of rows. They differ
from aggregate functions in that they do not condense every group to one line, but return
a value for every row. Regarding order of execution, analytical functions are the last set of
operations performed in a query except for the final ORDER BY clause. All joins and all
WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are
processed. Therefore, analytic functions can appear only in the select list or ORDER BY
clause.
With the help of PARTITION BY you can divide the table into multiple partitions, based on
the specified criteria. As a result, values within each partition will be calculated
independently from the rest of the table, similar to the GROUP BY clause. If no PARTITION
BY clause is stated, an analytical function always refers to the entire table. The ORDER BY
clause specifies the sort order of the rows within each partition. Note that this can differ
from the output sort order. If this clause is specified, not all the rows of a partition will be
used for the result computation, but only a part of them, a so called "window". By default,
this window consists of all the rows of the respective partition up to the current row
regarding the given sort criteria ("ROWS BETWEEN UNBOUND PRECEEDING AND CURRENT
ROW").
User-defined Functions
– Self-created scalar functions

– Oracle-compatible syntax
– Within functions it is possible to:

– Define and use variables
– Use other functions (user-defined or built-in)
– Use control structures (loops, branches)
– Use scalar sub-queries
User-defined Functions
In Exasol, you can create scalar functions yourself. The syntax is Oracle-compatible.
Within functions, you can define and use variables. The normal SQL data types are valid for
variable and parameter declarations. Any scalar SQL expressions can be used for
expressions, e.g. all built-in functions are available. You can also use scalar sub-queries,
which cannot contain any parameter.
a:=(select max(city_id) from cities);
Please note that a function will be executed for each row, thus it is not recommended to
use scalar sub-queries within functions.
Additionally, control stuctures such as loops or braches can be used in functions:
-- assignment
res := CASE WHEN input_variable<0 THEN 0 ELSE input_variable END;
-- if-branch
IF input_variable = 0 THEN
res := NULL;
ELSE
res := input_variable;
END IF;
-- for loop
FOR cnt:=1 TO input_variable
DO
res:=res*2;
END FOR;
-- while loop
WHILE cnt<=input_variable
DO
res:=res*2;
cnt:=cnt+1;
END WHILE;
User-defined functions: Example
CREATE OR REPLACE FUNCTION distance
(lat1 numeric(9,6), long1 numeric(9,6), lat2 numeric(9,6), long2 numeric(9,6))
RETURNS numeric(9,4)
IS res numeric(9,4);
BEGIN
res:=acos(sin(lat1/180*pi())*sin(lat2/180*pi()) +
cos(lat1/180*pi())*cos(lat2/180*pi())*
cos(long1/180*pi()-long2/180*pi()));
res:=cast(res*6378.137 as numeric(9,4));
RETURN res;
END distance
/
SELECT c1.CITY_NAME AS CITY_NAME1,
c2.CITY_NAME AS CITY_NAME2,
distance(c1.LAT, c1.LON, CITY_NAME1 CITY_NAME2 DISTANCE
c2.LAT, c2.LON)
AS DISTANCE Berlin Leipzig 148.6106
FROM CITIES c1
Berlin Hamburg 255.1255
JOIN CITIES c2
ON c1.CITY_NAME < c2.CITY_NAME Hamburg Leipzig 295.9456
;
User-defined functions: Example
Above shows an example how to create a user-defined function (first

statement) and later use it in a SELECT (second statement).
UDF Scripts
– Implement your own scalar, aggregate, analytical or generation functions
– Built-in languages: Java, R, Python, Lua

– Languages extendable using containers
– Usage of well-known libraries

– Development using Exasol interfaces (Java) or SDKs (Python, R)
UDF Scripts
UDF scripts can be used to implement your own scalar, aggregate, analytical or generation
function
Built-in languages: Java, R, Python, Lua

It is possible to extend the languages using containers
Usage of well-known libraries

Development using Exasol interfaces (Java) or SDKs (Python, R)
UDF Scripts: Example
– Summing up to values
– Scalar function
– Language Lua
CREATE LUA SCALAR SCRIPT myadd(i int, j int) RETURNS int AS

function run(ctx)
return ctx.i + ctx.j
end
/
– More information in the Advanced Analytics course

– Best practices, SDK usage, …
UDF Scripts: Example
Above example shows a UDF script written in Lua that creates a scalar function to add two
input values.
More details about UDF scripts are given in the Exasol Academy course Advanced Analytics
LIMIT clause
– Way to limit a result set
– Result is not deterministic
SELECT *
FROM SALES
LIMIT 10;
– LIMIT clause may be specified with an offset
SELECT *
FROM SALES
ORDER BY SALES_ID
LIMIT 10 OFFSET 10;
LIMIT clause
In Exasol, it is possible to limit a result set (without having to specify an ORDER BY

clause). The result of such a query is not deterministic, though.
The LIMIT clause can be combined with the OFFSET clause
Row numbering
– ROWNUM
– An (arbitrary) row number between 1 and n based on internal storage
– n is the number of rows in the table or subselect
– ROWID
– A unique address of a row inside a table
– The address is reassigned on any DML
– ROW_NUMBER()
– Analytical function which uniquely numbers the rows according to given sort criteria and partitioning
ROWNUM
ROWNUM is a pseudo column which numbers the records of a table or subselect, beginning
with 1. Has certain restrictions on usage.
ROWID
Every row of a table in the database has a unique address, the so-called ROWID
(DECIMAL(36,0) data type). The ROWIDs of a table are managed by the DBMS. This ensures
that the ROWIDs within a table are distinct – in contrast, it is quite acceptable for ROWIDs
of different tables to be the same. Using DML statements such as
INSERT, UPDATE, DELETE, TRUNCATE or MERGE, all the ROWIDs of the relevant tables are
invalidated and reassigned by the DBMS. In contrast to that, structural table changes such
as ALTER TABLE ADD COLUMN, will leave the ROWIDs unchanged. The ROWID pseudo
column is only valid for real tables, not for views or subselects.
An example of using ROWIDs would be the deletion of specific rows in a table, e.g. in order
to restore the UNIQUE property of a compound key where no other criterion is available to
distinguish between rows.
DENSE_RANK, RANK and ROW_NUMBER
These functions have no arguments, the ORDER BY clause is required. They return the rank
or row number within the partition, with the ORDER BY clause determining the ranking or
numbering.
For equal values in the sort expression, both RANK and DENSE_RANK return a common rank
for the affected rows, however DENSE_RANK doesn't skip following values as in case with
RANK. ROW_NUMBER returns unique numbers in any case, equal values will receive a
random ordering.
Row numbering: ROWNUM
– ROWNUM
SELECT SALES_ID, ROWNUM

FROM SALES
WHERE ROWNUM < 10; SALES_ID ROWNUM
389577429 1
321740964 2
389577438 3
253762032 4
321740973 5
389577447 6
458941123 7
389577513 8
389577522 9
Row numbering: ROWNUM
Above shows an example how to use ROWNUM

Row numbering: ROWID
– ROWID
SELECT SALES_ID, ROWID
FROM SALES SALES_ID ROWID
LIMIT 10; 389577429 321098139506691362218450234437009408
321740964 321098139506691362218450234437009409
389577438 321098139506691362218450234437009410
253762032 321098139506691362218450234437009411
321740973 321098139506691362218450234437009412
320831004 321098139506691362218731709413720064
252015775 321098139506691362218731709413720065
388667398 321098139506691362218731709413720066
388667407 321098139506691362218731709413720067
Row numbering: ROWID
Above shows an example how to use ROWID

Row numbering: ROW_NUMBER()
– ROW_NUMBER
SELECT * FROM
( SELECT SALES_ID, PRICE
row_number() over (ORDER BY PRICE DESC) AS MY_ROWNUM
FROM SALES
)
WHERE MY_ROWNUM <= 5;
SALES_ID PRICE MY_ROWNUM

243303093 625.41 1
388393990 622.89 2
26495229 621.33 3
161127730 601.47 4
35297065 587.98 5
Row numbering: ROW_NUMBER()
Above shows an example how to use the ROW_NUMBER function.

Constraint checks
– PRIMARY KEY check CITY_ID CITY_NAME

ZIP_
CODE
SELECT CITY_ID, CITY_NAME, ZIP_CODE 10729 Hemmerde 59427
WITH INVALID PRIMARY KEY( CITY_ID )
FROM CITIES; Hannover 30519
10729 Unna 59427
– UNIQUE check 10729 Unna 59427
SELECT CITY_ID, CITY_NAME, ZIP_CODE ZIP_

WITH INVALID UNIQUE( CITY_ID, CITY_NAME, ZIP_CODE ) CITY_ID CITY_NAME
CODE
FROM CITIES;
10729 Unna 59427
– FOREIGN KEY check 10729 Unna 59427
SELECT MARKET_ID
WITH INVALID FOREIGN KEY( CITY_ID )
MARKET_ID
FROM MARKETS 1803
REFERENCING CITIES( CITY_ID );
1802
Constraint checks
These queries show which rows would violate a constraint if that constraint would be
enabled.
The statements do not create a constraint, though.
Verification of the primary key property (PRIMARY KEY)

This construct can be used to check whether a number of columns have the primary key
property. This is the case if the specified columns do not contain data records in duplicate
and no NULL values are evident. Rows that do not conform to the primary key property are
selected.
Verification of the uniqueness (UNIQUE)
This construct can be used to verify whether the rows of a number of columns cols are
unique. This is the case if the specified columns cols do not contain data records in
duplicate. Rows in the specified columns cols which only contain NULL values are classified
as being unique (even if there is more than one). Non-unique rows are selected.
Verification of the foreign key property (FOREIGN KEY)
This construct can be used to check whether a number of columns in a table (TABLE)
possess the foreign key
property (FOREIGN KEY) in relation to another table (REF_TABLE). This is the case if the
specified columns do not contain NULL values and the column value of each row from table
also exists as a row in the specified columns of the referenced table (REF_TABLE). Rows
that violate the foreign key property are selected.
MERGE
– UPDATE, INSERT and DELETE in one single statement

– Integrates data from source table into target table
MERGE INTO CITIES c

USING NEW_CITIES n
ON (c.CITY_ID = n.CITY_ID)
WHEN MATCHED THEN UPDATE SET c.AREA = n.AREA,
c.AREA_SHORT = n.AREA_SHORT
DELETE WHERE TODO = 'DELETE'
WHEN NOT MATCHED THEN INSERT VALUES
(n.CITY_ID, n.COUNTRY_CODE, n.ZIP_CODE, n.CITY_NAME,
n.DISTRICT, n.AREA, n.AREA_SHORT, n.LAT, n.LON)
;
MERGE
This statement combines UPDATE, DELETE and INSERT and is a powerful method for data
manipulation, especially within ETL tasks.
The ON condition describes the correlation between the two tables (similar to a join). The
MATCHED clause is used for matching row pairs, the NOT MATCHED clause is used for those
where no match is found. Only equivalence conditions (=) are permitted in the ON
condition.
UPDATE clause: the optional WHERE condition specifies the circumstances under which
the UPDATE is conducted, whereby it is permissible for both the target table and the
source table to be referenced for this.
With the aid of the optional DELETE condition it is possible to delete rows in the target
table. Only rows that have been changed are taken into account and only values after the
UPDATE are available for conditions.
DELETE clause: the optional WHERE condition specifies the circumstances under which the
DELETE is conducted.
INSERT clause: the optional WHERE condition specifies the circumstances under which the
INSERT is conducted. In this respect, it is only permissible to reference the columns of the
source table.
Notes:
•The source table can be a physical table, a view or a subquery.
•The UPDATE or DELETE and INSERT clauses are optional with the restriction that at least
one must be specified. The order of the clauses can be exchanged.
•If there are several entries in the change table that could apply to an UPDATE of a single
row in the target table, this leads to the error message "Unable to get a stable set of rows
in the source tables" if the original value of the target table would be changed by the
UPDATE candidates.
•An update of columns used in the ON-condition is not allowed.
Comments on Database Objects
– Example: Comment on a table and columns
COMMENT ON TABLE SALES IS 'All sales';

COMMENT ON COLUMN SALES.SALES_ID IS 'Sales ID';
COMMENT ON COLUMN SALES.SALES_DATE IS 'Date of sales';
COMMENT ON COLUMN SALES.PRICE IS 'Sum of sales';
DESCRIBE FULL SALES;
COLUMN_NAME … COLUMN_COMMENT
SALES_ID … Sales ID
SALES_DATE … Date of sales
PRICE … Sum of sales
… … …
Comments on Database Objects
Comments may be given to all database objects: User

• Roles
• Connections
• Schemas
• Tables
• Columns
• Views
• Functions
• Scripts
It is allowed to assign comments within a specific command (“COMMENT ON … IS”, all

objects except views) or while creating an object (tables and views). Comments on views
and their columns can be given while creating the view, only.

02 Modul Exasol SQL - en

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 Modul Exasol SQL - en

Uploaded by

Copyright:

Available Formats

ACADEMY

– Exasol supports the major part of the SQL Standard 2008

2 Exasol SQL ACADEMY

Database definition (DDL)

– Each object in the database has a unique name

Table View Function Script

– These names are referenced via SQL identifiers

3 Exasol SQL ACADEMY

Every database object has its own (local) unique name.

– Defined without quotes

– Stored in the database in upper case:

4 Exasol SQL ACADEMY

– Enclosed in double quotation marks: "abc"

Identifier Name in database

5 Exasol SQL ACADEMY

"SELECT" "YES" "USER" "DATE" "TIME" …

– EXAplus: Syntax-Highlighting for reserved keywords

6 Exasol SQL ACADEMY

INTERVAL YEAR [(p)] TO MONTH p in [1;9]

– Several aliases: DOUBLE = DOUBLE PRECISION, INT = DEC(18,0), …

7 Exasol SQL ACADEMY

Exasol Data Types

Common data types are supported:

Not supported types:

– CHAR(n [CHAR]) [[CHARACTER SET] encoding]

– VARCHAR(n [CHAR]) [[CHARACTER SET] encoding]

An empty string ('') is translated to a NULL value

8 Exasol SQL ACADEMY

Data Type: String

10 Exasol SQL ACADEMY

Data Type: DATE

– TIMESTAMP WITH LOCAL TIME ZONE:

11 Exasol SQL ACADEMY

Data Type: TIMESTAMP

12 Exasol SQL ACADEMY

Subselects can be used like tables in SQL statements as shown above.

13 Exasol SQL ACADEMY

Common Table Expression (CTE)

They can be used for the combination of iterative queries, too:

– Example: (all markets from cities with > 5 markets)

14 Exasol SQL ACADEMY

– Aliases cannot be used within the same subselect for:

SELECT count(1) AS MARKET_COUNT, CITY_ID

15 Exasol SQL ACADEMY

Aliases: The problem

– Keyword 'local’ allows to use aliases

SELECT count(1) AS MARKET_COUNT, CITY_ID

16 Exasol SQL ACADEMY

Aliases: Solution with LOCAL

– CURRENT_USER shows the currently connected user

– SCOPE_USER shows the owner of the current object

17 Exasol SQL ACADEMY

CURRENT_USER vs. SCOPE_USER

CURRENT_USER shows the username of the currently connected user

– Computed for each row of the input table

– Numeric SELECT power (2,10);

– Strings SELECT substr('abcdefg', 2, 5);

– Date / Timestamp SELECT year(sysdate) AS current_year;

SELECT st_distance('POINT(0 0)', 'POINT(3 4)')

SELECT TO_CHAR(current_date, 'DD. MONTH YYYY', 'NLS_DATE_LANGUAGE=GERMAN');

Special format elements

20 Exasol SQL ACADEMY