Professional Documents
Culture Documents
User Defined Functions in DB2
User Defined Functions in DB2
User Defined Functions (UDFs) enable users to write their own functions which can be used in SQL
statements. This article describes how this option can be used for «DB2 UDB for z/OS and OS/390».
The difference between UDFs for DB2 mainframe installations and DB2 in Linux, Unix and the
Windows environment is small and is mostly restricted to differences in the operating systems.
Current practice is to develop UDFs and stored procedures for both environments. This article focuses
on those issues where the external UDFs differ from the stored procedures (SFs).
1 Introduction
Functions vastly enhance the power of SQL. They are invoked with SQL language elements, in other
words, from within the SELECT clause, the FROM clause, or the WHERE clause, depending on the
type of function.
User Defined Functions (UDFs) enable users to write their own functions, which can be used in SQL,
DDL or DML statements. Three types of UDF exist:
Sourced UDFs are based on an existing function. The base function can either be a built-in
function or another UDF.
Scalar functions can be written in a higher programming language or in SQL Procedure
Language.
Table functions must be written in a higher programming language.
The functional scope of SQL can be considerably enhanced with UDFs. Built-in functions are very
useful, however, they may not always cover all requirements. Reasons for using UDFs:
Special transformations, for example, converting the account number from an internal to an
external format.
Simple calculations, for example, company-specific calculation of years of service.
Option of standardization.
Built-in functions for User Defined Types by means of sourced functions.
Migration from other DBMSs: The different DBMSs differ considerably in the scope and
specification of their supplied functions. Many functions used frequently in other DBMSs have
different names, parameters, or they do not exist in DB2 at all.
Complex SQL logic can be embedded in UDFs. This enables users to write simpler SQL
statements.
2 Sourced UDFs
Sourced UDFs are based on existing functions. They are absolutely necessary if User Defined Types
are being used. The built-in functions cannot simply be applied to User Defined Types. If they are
required, then UDFs -based on the desired built-in functions- must be generated, as indicated in the
example below.
CREATE DISTINCT TYPE KM
AS INTEGER
WITH COMPARISONS;
The procedure for creating the UDFs is very similar to the stored procedures. Some particular features
of functions, which will be discussed further on, do have to be taken into account:
DETERMINISTIC option
Linkage convention
SCRATCHPAD option
FINAL CALL option
Program logic for scalar functions
Program logic for table functions
This option is one of many that must be defined for CREATE FUNCTION. If a function is defined as
DETERMINISTIC, it means that it always returns the same result for the same input. SUBSTR is an
example of this type of function. RAND, on the other hand, is not deterministic, because RAND will
return a different result every time it is invoked. The default is NOT DETERMINISTIC. Since the
description is far from spectacular, it would be all too easy to dispense with this option. This can be
harmful, as the following example illustrates:
The UDF in this example
does nothing more than
convert a number from
an internal to an external
format. Several minutes of
processing time for a simple
statement are unacceptable.
The reason for the processing time was obvious from the EXPLAIN result: A tablespace scan was
executed for the SELECT statement in the FROM clause bracket section. The result was then
materialized and read with tablespace scan.
A look at the SQL Reference Manual gave us our explanation.
Readers should take a close look at this short text. NOT DETERMINISTIC means
worse access path
unexpected results.
In the example above, the definition of the UDF was changed to DETERMINISTIC. The result was
delivered a split second later, since the existing index was used for the SELECT in the bracket section.
A scratchpad is a certain memory space supplied by DB2 for passing information from one call of the
UDF to the next. CREATE FUNCTION defines whether a scratchpad will be created for a UDF and
what size it should be.
DB2 provides one scratchpad per
SQL statement
occurrence within the SQL statement
parallel task
We will use an example to show what that means. Let’s take a look at the following statement:
The optimizer decided to execute this statement in three parallel tasks. This means that DB2 makes 6
scratchpads available!
The scratchpad is initialized by DB2 to X’00’. The programmers themselves are responsible for
complying with the maximum length. Initializing scratchpads is
time-consuming, so if scratchpads have to be initialized for singleton
SELECTs, users will immediately notice longer response times.
DB2 can use FINAL CALL for the UDF to request one special call
for initializing tasks and an extra call for finalizing tasks. These
initialization and termination calls are invoked per
SQL statement
occurrence within the SQL statement
parallel task
Scalar functions are used in the SELECT or WHERE clause and can thus vastly extend the functional
scope of SQL. They may be invoked several times per SQL statement. For this reason, performance
aspects must be given due consideration when writing the
function.
SELECT COUNT(*)
FROM TABLE(TEST.TABUDF(1,2)) AS A;
Table UDFs are invoked in the FROM clause. They can also be used as part of join operations.
Depending on the access path selected by the optimizer, table functions can also become inner
tables. To give the optimizer the option of selecting an operative access path, the CARDINALITY
parameter should be specified for CREATE FUNCTION. This allows to specify the expected number of
result rows.
4 Design Considerations
The DB2 manual has lots of information on UDFs. We recommend taking all of it into account, even
those subjects which seem to require little attention. The most important things to consider are:
SQLSTATE is a CHAR(5) field requiring a valid value at each function end.
Modifying SQL statements are only possible in scalar UDFs if invoked by an UPDATE or
INSERT statement.
In the UDF programs, the cursors must all be closed explicitly, otherwise SQL will return a
negative code. There is no implicit CLOSE.
If the UDF is defined as NOT DETERMINISTIC, the following restrictions apply:
> The UDF cannot be used in CASE expressions.
> It cannot be used in the ORDER BY clause.
> It should not be used in WHERE clause predicates as the results would be unexpected.
DISALLOW PARALLEL should be specified in the following cases:
> If the UDF is NOT DETERMINISTIC
> If a scratchpad is used
> If FINAL CALL is specified
> If MODIFIES SQL DATA is specified for scalar UDFs
> If EXTERNAL ACTION is specified
> If a table UDF is being used.
One of the most important differences between built-in functions and external UDFs is the fact that all
UDFs are FENCEd. This protects DB2 against errors in the application code. UDFs therefore do not
run in the DB2 address space, but under the control of the language environment in a WLM address
space. Built-in functions, on the other hand, are a component of the DB2 code and run in the DB2
address space. Nonetheless, the developers themselves can contribute to the efficiency of the UDFs:
The number of input parameters should be kept as low as possible since each input parameter
increases the overhead.
UDF code should be re-entrant so that the STAY RESIDENT YES option can be defined. This is
especially important if the same UDF is invoked several times in a single SQL statement. STAY
RESIDENT YES has the following impact:
> Once the load module is active, it remains in the memory
> This single copy can then be used across several UDF calls.
When using UDFs, the access path should always be checked using EXPLAIN, as UDFs can
change the access path.
5 Summary
UDFs extend the functional scope of SQL considerably. They can embed application code to make it
available to SQL. Writing UDFs is not difficult, however, use of UDFs should be well planned in
order to avoid unpleasant surprises.