Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

User Defined Functions in DB2

by Rosmarie Peter, Trivadis AG

User Defined Functions (UDFs) enable users to write their own functions which can be used in SQL
statements. This article describes how this option can be used for «DB2 UDB for z/OS and OS/390».
The difference between UDFs for DB2 mainframe installations and DB2 in Linux, Unix and the
Windows environment is small and is mostly restricted to differences in the operating systems.
Current practice is to develop UDFs and stored procedures for both environments. This article focuses
on those issues where the external UDFs differ from the stored procedures (SFs).

1 Introduction

1.1 What are functions?

Functions vastly enhance the power of SQL. They are invoked with SQL language elements, in other
words, from within the SELECT clause, the FROM clause, or the WHERE clause, depending on the
type of function.

Functions can be classified differently:


ƒ Built-in functions: These are called built-in since they are incorporated in the supplied DB2
code. Examples of built-in functions are MAX and SUBSTR.
ƒ User Defined Functions (UDFs): These are functions written by the user which can be used in
SQL statements. They are written by customers, or by IBM itself. Examples of UDFs supplied
by IBM are the MQ functions, or the functions included in extenders.

Functions can also be classified in another way:


ƒ The input for Column functions is a collection of column values; they return a single value.
Examples are SUM, MIN, MAX. They are used in the SELECT clause. In the WHERE clause,
these functions must be embedded in subselects. Users cannot write their own new column
functions. They can only be made available as sourced UDFs, typically for User Defined
Types.
ƒ Scalar functions have one or more input values as function parameters. The function returns a
scalar value. SUBSTR is an example of this type of function. Scalar functions can be used in
the SELECT or WHERE clause wherever a single value is permitted. Most built-in functions
supplied with DB2 are scalar. But it is also possible for users to write their own scalar
functions and to make them available as UDFs. Scalar UDFs can be generated in the SQL
Procedure Language or in other programming languages. These are then called external
functions.
ƒ Table functions have one or more input values as function parameters. The table function
returns a table. It is used in the FROM clause. There are no built-in table functions. All table
functions are external functions
Functions are part of the SQL standard. But the DBMSs differ considerably with regard to built-in
functions. With User Defined Functions, the degree of portability can be significantly improved.

1.2 User Defined Functions

User Defined Functions (UDFs) enable users to write their own functions, which can be used in SQL,
DDL or DML statements. Three types of UDF exist:
ƒ Sourced UDFs are based on an existing function. The base function can either be a built-in
function or another UDF.
ƒ Scalar functions can be written in a higher programming language or in SQL Procedure
Language.
ƒ Table functions must be written in a higher programming language.

Scalar Column Table


Built-in Function yes yes -
UDF SQL yes - -
External yes - yes
Sourced yes yes -

Other UDF features:


ƒ All UDFs are entered in the DB2 catalog. This is done using a CREATE FUNCTION statement.
ƒ The name of the UDF consists of a schema name and the name of the function.
ƒ Relational and other data, such as IMS or flat files, can be read within UDFs. UDFs can be
altered to a certain extent.
ƒ UDFs can be nested by up to 16 layers. The call hierarchy can contain both UDFs and stored
procedures.
ƒ UDFs can be invoked from triggers.
ƒ UDFs always run under WLM control.
ƒ UDFs offer the option of function overload. Several functions can be defined with the same
name, only differing from each other in their parameters. The number of parameters can differ,
as well as their data types. When the function is invoked, the DBMS will be able to select and
execute the correct UDF.

1.3 Why UDFs?

The functional scope of SQL can be considerably enhanced with UDFs. Built-in functions are very
useful, however, they may not always cover all requirements. Reasons for using UDFs:
ƒ Special transformations, for example, converting the account number from an internal to an
external format.
ƒ Simple calculations, for example, company-specific calculation of years of service.
ƒ Option of standardization.
ƒ Built-in functions for User Defined Types by means of sourced functions.
ƒ Migration from other DBMSs: The different DBMSs differ considerably in the scope and
specification of their supplied functions. Many functions used frequently in other DBMSs have
different names, parameters, or they do not exist in DB2 at all.
ƒ Complex SQL logic can be embedded in UDFs. This enables users to write simpler SQL
statements.

2 Sourced UDFs
Sourced UDFs are based on existing functions. They are absolutely necessary if User Defined Types
are being used. The built-in functions cannot simply be applied to User Defined Types. If they are
required, then UDFs -based on the desired built-in functions- must be generated, as indicated in the
example below.
CREATE DISTINCT TYPE KM
AS INTEGER
WITH COMPARISONS;

CREATE FUNCTION KM_MAX(KM)


RETURNS(KM)
SOURCE SYSIBM.MAX(INTEGER);

3 Generating external UDFs


The following are the steps involved in creating UDFs:
ƒ CREATE FUNCTION: Introduces a UDF to DB2. CREATE FUNCTION needs to be used once.
The function can be created after CREATE FUNCTION. If the definitions need to be changed
later, this can be done using ALTER.
ƒ Writing the program. This can be done in one of the following languages: C, C++, COBOL,
PL/I, Java. Naturally, the particular features of the individual languages have to be taken into
account for UDFs, too.
ƒ The programs must then be converted like other programs. A package has to be created from
the DBMS.

The procedure for creating the UDFs is very similar to the stored procedures. Some particular features
of functions, which will be discussed further on, do have to be taken into account:
ƒ DETERMINISTIC option
ƒ Linkage convention
ƒ SCRATCHPAD option
ƒ FINAL CALL option
ƒ Program logic for scalar functions
ƒ Program logic for table functions

3.1 DETERMINISTIC option

This option is one of many that must be defined for CREATE FUNCTION. If a function is defined as
DETERMINISTIC, it means that it always returns the same result for the same input. SUBSTR is an
example of this type of function. RAND, on the other hand, is not deterministic, because RAND will
return a different result every time it is invoked. The default is NOT DETERMINISTIC. Since the
description is far from spectacular, it would be all too easy to dispense with this option. This can be
harmful, as the following example illustrates:
The UDF in this example
does nothing more than
convert a number from
an internal to an external
format. Several minutes of
processing time for a simple
statement are unacceptable.
The reason for the processing time was obvious from the EXPLAIN result: A tablespace scan was
executed for the SELECT statement in the FROM clause bracket section. The result was then
materialized and read with tablespace scan.
A look at the SQL Reference Manual gave us our explanation.

Readers should take a close look at this short text. NOT DETERMINISTIC means
ƒ worse access path
ƒ unexpected results.
In the example above, the definition of the UDF was changed to DETERMINISTIC. The result was
delivered a split second later, since the existing index was used for the SELECT in the bracket section.

3.2 Linkage for UDFs

The linkage convention defines how the UDF


communicates externally. The structure
corresponds mostly to the structure of stored
procedures that have been defined with
PARAMETER STYLE DB2SQL and DBINFO.
There are two additional fields:
ƒ The scratchpad: It is possible to define an
area where information can be passed on
from one command to the next.
ƒ The CALL type: is used for program
control
As is the case for stored procedures, SQLSTATEs
can be defined with UDFs. What is to inserted in
the SQLSTATE field is visible to the invoking
element in the SQLCA, together with the text
from diagnostic data.

3.3 SCRATCHPAD option

A scratchpad is a certain memory space supplied by DB2 for passing information from one call of the
UDF to the next. CREATE FUNCTION defines whether a scratchpad will be created for a UDF and
what size it should be.
DB2 provides one scratchpad per
ƒ SQL statement
ƒ occurrence within the SQL statement
ƒ parallel task

We will use an example to show what that means. Let’s take a look at the following statement:

SELECT MYUDF(C1,1), MYUDF(C2,1)


FROM TABA;

The optimizer decided to execute this statement in three parallel tasks. This means that DB2 makes 6
scratchpads available!
The scratchpad is initialized by DB2 to X’00’. The programmers themselves are responsible for
complying with the maximum length. Initializing scratchpads is
time-consuming, so if scratchpads have to be initialized for singleton
SELECTs, users will immediately notice longer response times.

3.4 FINAL CALL option

DB2 can use FINAL CALL for the UDF to request one special call
for initializing tasks and an extra call for finalizing tasks. These
initialization and termination calls are invoked per
ƒ SQL statement
ƒ occurrence within the SQL statement
ƒ parallel task

FINAL CALL must be specified if special resources have to be


allocated for the UDF. These must then be explicitly shared in the
final call.

3.5 Program logic for scalar functions

The program control of UDFs is decided by CALL TYPE, as


can be seen from the following pseudo code:
ƒ If «FINAL CALL» is specified, then CALL TYPE is set
by DB2
> -1 first call
> 0 normal call
> +1 final call
> 255 final call, if the calling application
terminates the unit of work
ƒ without «FINAL CALL», the CALL_TYPE for scalar
UDFs is irrelevant
ƒ Error messages:
> UDF_SQLSTATE
> UDF_DIAG_MSG

Scalar functions are used in the SELECT or WHERE clause and can thus vastly extend the functional
scope of SQL. They may be invoked several times per SQL statement. For this reason, performance
aspects must be given due consideration when writing the
function.

3.6 Program logic for table functions

A table UDF returns a table. It can thus be used as an


alternative to stored procedures with result sets. TABLE UDFs are
used in FROM clauses:

SELECT COUNT(*)
FROM TABLE(TEST.TABUDF(1,2)) AS A;

For DB2, table UDFs take on a type of cross form:


ƒ DB2 sends the UDF program a message to open the cursor.
ƒ In the subsequent calls, the UDF program receives the command to execute a FETCH and to
return a result row.
ƒ If no more rows are available, the program sets STLSTATE 02000.
ƒ At the end, DB2 requests the UDF program to execute CLOSE for the cursor.
The requested program logic is again controlled by CALL
TYPE.
ƒ Call types without «FINAL CALL»
> -1 open call
> 0 fetch call
> +1 close call
ƒ Call types with «FINAL CALL»
> -2 first call
> -1 open call
> 0 fetch call
> +1 close call
> 2 final call
> 255 final call, if UOR is terminated by the
calling UOR.
ƒ Error messages:
> UDF_SQLSTATE
> UDF_DIAG_MSG

Table UDFs are invoked in the FROM clause. They can also be used as part of join operations.
Depending on the access path selected by the optimizer, table functions can also become inner
tables. To give the optimizer the option of selecting an operative access path, the CARDINALITY
parameter should be specified for CREATE FUNCTION. This allows to specify the expected number of
result rows.

4 Design Considerations

4.1 What is feasible and what isn't?

The DB2 manual has lots of information on UDFs. We recommend taking all of it into account, even
those subjects which seem to require little attention. The most important things to consider are:
ƒ SQLSTATE is a CHAR(5) field requiring a valid value at each function end.
ƒ Modifying SQL statements are only possible in scalar UDFs if invoked by an UPDATE or
INSERT statement.
ƒ In the UDF programs, the cursors must all be closed explicitly, otherwise SQL will return a
negative code. There is no implicit CLOSE.
ƒ If the UDF is defined as NOT DETERMINISTIC, the following restrictions apply:
> The UDF cannot be used in CASE expressions.
> It cannot be used in the ORDER BY clause.
> It should not be used in WHERE clause predicates as the results would be unexpected.
ƒ DISALLOW PARALLEL should be specified in the following cases:
> If the UDF is NOT DETERMINISTIC
> If a scratchpad is used
> If FINAL CALL is specified
> If MODIFIES SQL DATA is specified for scalar UDFs
> If EXTERNAL ACTION is specified
> If a table UDF is being used.

4.2 UDF Efficiency

One of the most important differences between built-in functions and external UDFs is the fact that all
UDFs are FENCEd. This protects DB2 against errors in the application code. UDFs therefore do not
run in the DB2 address space, but under the control of the language environment in a WLM address
space. Built-in functions, on the other hand, are a component of the DB2 code and run in the DB2
address space. Nonetheless, the developers themselves can contribute to the efficiency of the UDFs:
ƒ The number of input parameters should be kept as low as possible since each input parameter
increases the overhead.
ƒ UDF code should be re-entrant so that the STAY RESIDENT YES option can be defined. This is
especially important if the same UDF is invoked several times in a single SQL statement. STAY
RESIDENT YES has the following impact:
> Once the load module is active, it remains in the memory
> This single copy can then be used across several UDF calls.
ƒ When using UDFs, the access path should always be checked using EXPLAIN, as UDFs can
change the access path.
5 Summary
UDFs extend the functional scope of SQL considerably. They can embed application code to make it
available to SQL. Writing UDFs is not difficult, however, use of UDFs should be well planned in
order to avoid unpleasant surprises.

You might also like