Professional Documents
Culture Documents
SAP IQ Unstructured Data Analytics en
SAP IQ Unstructured Data Analytics en
SAP IQ Unstructured Data Analytics en
SAP IQ 16.1 SP 05
Document Version: 1.0.0 – 2023-06-26
4 External Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Pre-Filter and Term-Breaker External Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
External Library Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.2 External Libraries on Multiplex Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Enable and Disable External Libraries on Startup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Binary Large Object (BLOB) and Character Large Object (CLOB) storage and retrieval.
Learn about unstructured data analytics in SAP IQ and about large object data compatibility and conformance.
In this section:
Audience [page 6]
This guide is for users who require reference material for working with unstructured data in SAP IQ.
Compatibility [page 7]
SAP SQL Anywhere (SA) and SAP Adaptive Server Enterprise (ASE) store large text and binary objects.
2.1 Audience
This guide is for users who require reference material for working with unstructured data in SAP IQ.
Learn about available syntax, parameters, functions, stored procedures, indexes, and options related to
unstructured data analytics features. Use this guide as a reference to understand storage and retrieval of
unstructured data within the database.
The Unstructured Data Analytics Option extends the capabilities of SAP IQ to allow storage, retrieval, and full
text searching of binary large objects (BLOBs) and character large objects (CLOBs) within the database.
Note
Users must be specifically licensed to use the Unstructured Data Analytics functionality described in this
product documentation.
As data volumes increase, the need to store large object (LOB) data in a relational database also increases.
LOB data may be either:
Typical LOB data sources include images, maps, documents (for example, PDF files, word processing files, and
presentations), audio, video, and XML files. SAP IQ can manage individual LOB objects containing gigabytes
(GB), terabytes (TB), or even petabytes (PB) of data.
By allowing relational and unstructured data in the same location, SAP IQ lets organizations access both types
of data using the same application and the same interface. The full text search capability of SAP IQ supports
text archival applications (text analytics) in handling unstructured and semistructured data.
In this section:
Full text searching uses TEXT indexes to search for terms and phrases in a database without having to scan
table rows.
A TEXT index stores positional information for terms in the indexed column. Text configuration objects control
the terms that are placed in a TEXT index when it is built or refreshed, and how a full text query is interpreted.
Using a TEXT index to find rows that contain a term or phrase is generally faster than scanning every row in the
table.
2.3 Compatibility
SAP SQL Anywhere (SA) and SAP Adaptive Server Enterprise (ASE) store large text and binary objects.
SAP SQL Anywhere can store large objects (up to a 2 GB maximum length) in columns of data type
LONG VARCHAR or LONG BINARY. The support of these data types by SAP SQL Anywhere is SQL/2003
compliant. SAP SQL Anywhere does not support the BYTE_LENGTH64, BYTE_SUBSTR64, BFILE, BIT_LENGTH,
OCTET_LENGTH, CHAR_LENGTH64, and SUBSTRING64 functions.
SAP ASE can store large text objects (up to a 2 GB maximum length) and large binary objects (up to a 2 GB
maximum length) in columns of data type TEXT or IMAGE, respectively. The support of these data types by SAP
ASE is an ANSI SQL Transact-SQL extension.
A LONG BINARY column of a proxy table maps to a VARBINARY(max) column in a Microsoft SQL Server table.
SAP IQ LONG BINARY and LONG VARCHAR functionality conforms to the Core level of the ISO/ANSI SQL
standard.
Learn about working with TEXT indexes and text configuration objects.
A TEXT index stores positional information for terms in an indexed column. TEXT indexes are created using
settings stored in a text configuration object. A text configuration object controls characteristics of TEXT index
data, such as terms to ignore, and the minimum and maximum length of terms to include in the index.
In this section:
Before you can perform a full text search, you must create a TEXT index on the columns you want to search. A
TEXT index stores positional information for terms in the indexed columns. Queries that use TEXT indexes are
generally faster than those that must scan all the values in the table.
When you create a TEXT index, you can specify which text configuration object to use when creating and
refreshing the TEXT index. A text configuration object contains settings that affect how an index is built. If you
do not specify a text configuration object, the database server uses a default configuration object.
You can create TEXT indexes on these types of columns: CHAR, VARCHAR, and LONG VARCHAR, as well as
BINARY, VARBINARY, and LONG BINARY. BINARY, VARBINARY, and LONG BINARY columns require that the
TEXT index use a text configuration with an external prefilter library.
In this section:
Modifying the TEXT Index Location Using Interactive SQL [page 14]
Change the dbspace where the TEXT index is stored.
Conjunction of terms Yes, expressed in the form: Yes, expressed in the form:
tbl.col CONTAINS(tbl.col,'great
CONTAINS('great','white' white whale')
,'whale')
General boolean expressions Yes, expressed in the form: Yes, expressed in the form:
CONTAINS
(tbl.col,'whale*')
Acceleration of LIKE predicates Yes, for example: Yes, if the TEXT index is an NGRAM
TEXT index without an external docu-
tbl.col LIKE 'whale%' ment pre-filter. For example:
c LIKE '%apple%fruit'
CONTAINS(tbl.col,
'white BEFORE whale')
CONTAINS(tbl.col,
'whale NEAR white')
CONTAINS(tbl.col, '
"white whale" ')
In TEXT index, searching for terms matching a prefix and searching for a LIKE expression have different
semantics and may return very different results depending on the text configuration. The specification of
minimum length, maximum length and a stoplist will govern the prefix processing but does not affect LIKE
semantics.
Note
Meaning of Boolean expressions will differ between WD index and TEXT index when term dropping occurs,
because the effect of dropped terms in TEXT index processing has no equivalent in the WD index.
Before you can perform a full text search, you must create a TEXT index on the columns you want to search.
Context
A TEXT index stores positional information for terms in the indexed columns.
1. Connect to the database as a user with CREATE ANY INDEX or CREATE ANY OBJECT system privilege, or
as the owner of the underlying table.
2. Execute a CREATE TEXT INDEX statement.
Example
This example creates a TEXT index, myTxtIdx, on the CompanyName column of the Customers table in the
iqdemo database. The default_char text configuration object is used:
Where:
The temporary space required in bytes for the TEXT index is (<M>+20)* <T>, where <M> is the maximum term
length for the text configuration in bytes:
Note
• SAP IQ does not provide support for TEXT indexes spanning multiple columns.
• TEXT index manual refresh or automatic refresh options are not supported.
• sp_iqrebuildindex cannot be used to build TEXT indexes.
• You cannot create TEXT indexes within BEGIN PARALLEL IQ…END PARALLEL IQ.
Procedure
1. Connect to the database as a user with any of the following system privileges:
• CREATE ANY INDEX
• ALTER ANY INDEX
• DROP ANY INDEX
• CREATE ANY OBJECT
• ALTER ANY OBJECT
• DROP ANY OBJECT
• MANAGE ANY DBSPACE
2. Execute a SELECT statement.
Example
Change the settings for the TEXT index, including the dbspace and TEXT index name.
Procedure
1. Connect to the database as a user with ALTER ANY INDEX or ALTER ANY OBJECT system privilege or as
the owner of the underlying table.
2. Execute an ALTER TEXT INDEX statement.
Example
Procedure
1. Connect to the database as a user with MANAGE ANY DBSPACE system privilege.
2. Execute an ALTER TEXT INDEX statement with the MOVE TO clause.
Example
Procedure
1. Connect to the database as a user with DROP ANY INDEX or DROP ANY OBJECT system privilege or as the
owner of the underlying table.
2. Execute a DROP TEXT INDEX statement.
Example
The only supported refresh type for TEXT indexes on SAP IQ tables is Immediate Refresh, which occurs when
data in the underlying table changes.
Immediate-refresh TEXT indexes on tables support isolation level 3. They are populated at creation time and
every time the data in the column is changed using an INSERT, UPDATE, or DELETE statement. An exclusive
lock is held on the table during the initial refresh.
Allowed Values
0–2
Default
Scope
Requires the SET ANY PUBLIC OPTION system privilege to set this option for PUBLIC or for other user or role.
Can be set temporary only for the PUBLIC role. Takes effect immediately.
Description
TEXT_DELETE_METHOD specifies the algorithm used during a delete operation in a TEXT index. When this
option is not set or is set to 0, the delete method is selected by the cost model. The cost model considers the
CPU-related costs as well as I/O-related costs in selecting the appropriate delete algorithm. The cost model
takes into account:
• Rows deleted
• Index size
• Width of index data type
• Cardinality of index data
• Available temporary cache
• Machine-related I/O and CPU characteristics
• Available CPUs and threads
Example
NGRAM TEXT index stores the text in the column by breaking the text into n-grams of text value N, where N is
the value given by a user.
You can perform a search over an NGRAM TEXT index by matching the n-grams of the text value in the
CONTAINS clause of the query against the stored n-grams in the index.
NGRAM TEXT index accommodates fuzzy searching capability over the text for both European and non-
European languages.
Note
NGRAM TEXT index search is mainly useful when words are misspelled. SAP IQ does not support searches
like synonyms and antonyms.
NGRAM term breaker is built on TEXT indexes, so use text configuration object settings to define whether to use
an NGRAM or GENERIC TEXT index.
Text configuration objects control the terms that are placed in a TEXT index when it is built or refreshed, and
how a full text query is interpreted.
When the database server creates or refreshes a TEXT index, it uses the settings for the text configuration
object specified when the TEXT index was created. If a text configuration object is not specified, the database
server chooses one of the default text configuration objects, based on the type of data in the columns being
indexed. In an SAP IQ database, the default_char text configuration object is always used.
Text configuration objects specify which prefilter library and which term breaker are used to generate terms
from the documents to be indexed. They specify the minimum and maximum length of terms to be stored
within the TEXT index, along with the list of terms that should not be included. Text configuration objects
consist of these parameters:
• Document pre-filter – removes unnecessary information, such as formatting and images, from the
document. The filtered document is then picked up by other modules for further processing. The
document pre-filter is provided by a third-party vendor.
• Document term-breaker – breaks the incoming byte stream into terms separated by term separators or
according to specified rules. The document term-breaker is provided by the server or a third-party vendor.
• Stoplist processor – specifies the list of terms to be ignored while building the TEXT index.
In this section:
The default text configuration object default_char is used with non-NCHAR data. This configuration is
created the first time you create a text configuration object or TEXT index.
The text configuration object default_nchar is supported for use with NCHAR for TEXT indexes on IN
SYSTEM tables; you cannot use default_nchar text configuration for TEXT indexes on tables.
The following shows the default settings for default_char and default_nchar, which are best suited for
most character-based languages. We strongly suggest that you do not change the settings in the default text
configuration objects.
STOPLIST (empty)
If you delete a default text configuration object, it is automatically re-created with default values the next time
you create a TEXT index or text configuration object.
Create a text configuration to specify how TEXT indexes dependent on the text configuration process handle
terms within the data.
Procedure
1. Connect to the database as a user with CREATE TEXT CONFIGURATION or CREATE ANY TEXT
CONFIGURATION or CREATE ANY OBJECT system privilege.
2. Execute a CREATE TEXT CONFIGURATION statement.
Example
To create a text configuration object called myTxtConfig using the default_char text configuration object
as a template:
Learn about text configuration object settings, how they affect what is indexed, and how a full text search query
is interpreted.
In this section:
Related Information
The TERM BREAKER setting specifies the algorithm to use for breaking strings into terms.
Note
NGRAM term breakers store n-grams. An n-gram is a group of characters of length <n>, where <n> is the
value of MAXIMUM TERM LENGTH.
Regardless of the term breaker you specify, the database server records in the TEXT index the original
positional information for terms when they are inserted into the TEXT index. In the case of n-grams, what
is stored is the positional information of the n-grams, not the positional information for the original terms. The
TERM BREAKER impact is as follows:
GENERIC TEXT index – when building a GENERIC TEXT GENERIC TEXT index – when querying a GENERIC TEXT
index (the default), groups of alphanumeric characters index, terms in the query string are processed as if they were
appearing between non-alphanumeric characters are proc- being indexed. Matching is performed by comparing query
essed as terms by the database server. After the terms have terms to terms in the TEXT index.
been defined, terms that exceed the term-length settings,
and terms found in the stoplist, are counted — but not in-
serted — in the TEXT index.
NGRAM TEXT index – when building an NGRAM TEXT in- NGRAM TEXT index – when querying an NGRAM TEXT
dex, the database server treats as a term any group of alpha- index, terms in the query string are processed in the same
numeric characters between non-alphanumeric characters. manner as if they were being indexed. Matching is per-
Once the terms are defined, the database server breaks the formed by comparing n-grams from the query terms to n-
terms into n-grams. In doing so, terms shorter than <n>, and grams from the indexed terms.
n-grams that are in the stoplist, are discarded.
The MINIMUM TERM LENGTH setting specifies the minimum length, in characters, for terms inserted in the
index or searched for in a full text query.
MINIMUM TERM LENGTH has special implications on prefix searching. The value of MINIMUM TERM LENGTH
must be greater than 0. If you set it higher than MAXIMUM TERM LENGTH, then MAXIMUM TERM LENGTH is
automatically adjusted to be equal to MINIMUM TERM LENGTH.
The default for MINIMUM TERM LENGTH is taken from the setting in the default text configuration object, which
is typically 1. The MINIMUM TERM LENGTH impact is as follows:
GENERIC TEXT index – for GENERIC TEXT indexes, the GENERIC TEXT index – when querying a GENERIC TEXT
TEXT index will not contain words shorter than MINIMUM index, query terms shorter than MINIMUM TERM LENGTH
TERM LENGTH. are ignored because they cannot exist in the TEXT index.
NGRAM TEXT index – for NGRAM TEXT indexes, this set- NGRAM TEXT index – the MINIMUM TERM LENGTH set-
ting is ignored. ting has no impact on full text queries on NGRAM TEXT
indexes.
The MAXIMUM TERM LENGTH setting specifies the maximum length, in characters, for terms inserted in the
index or searched for in a full text query.
The MAXIMUM TERM LENGTH setting is used differently, depending on the term breaker algorithm. The value
of MAXIMUM TERM LENGTH must be less than or equal to 60. If you set MAXIMUM TERM LENGTH lower than
the MINIMUM TERM LENGTH, then MINIMUM TERM LENGTH is automatically adjusted to be equal to MAXIMUM
TERM LENGTH.
The default for this setting is taken from the setting in the default text configuration object, which is typically
20. The MAXIMUM TERM LENGTH impact is as follows:
GENERIC TEXT index – for GENERIC TEXT indexes, GENERIC TEXT index – for GENERIC TEXT indexes,
MAXIMUM TERM LENGTH specifies the maximum length, query terms longer than MAXIMUM TERM LENGTH are
in characters, for terms inserted in the TEXT index. ignored because they cannot exist in the TEXT index.
NGRAM TEXT index – for NGRAM TEXT indexes, NGRAM TEXT index – for NGRAM TEXT indexes, query
MAXIMUM TERM LENGTH determines the length of the terms are broken into n-grams of length <n>, where <n>
n-grams that terms are broken into. An appropriate choice is the same as MAXIMUM TERM LENGTH. The database
of length for MAXIMUM TERM LENGTH depends on the server uses the n-grams to search the TEXT index. Terms
language. Typical values are 4 or 5 characters for English, shorter than MAXIMUM TERM LENGTH are ignored be-
and 2 or 3 characters for Chinese. cause they do not match the n-grams in the TEXT index.
The default for the stoplist setting is taken from the setting in the default text configuration object, which
typically has an empty stoplist. The STOPLIST impact is as follows:
GENERIC TEXT index – for GENERIC TEXT indexes, GENERIC TEXT index – for GENERIC TEXT indexes,
terms that are in the stoplist are not inserted into the TEXT query terms that are in the stoplist are ignored because they
index. cannot exist in the TEXT index.
NGRAM TEXT index – for NGRAM TEXT indexes, the TEXT NGRAM TEXT index – terms in the stoplist are broken into
index does not contain the n-grams formed from the terms n-grams, which are used for the stoplist. Likewise, query
in the stoplist. terms are broken into n-grams, and any that match n-grams
in the stoplist are dropped because they cannot exist in the
TEXT index.
Consider carefully whether to put terms in to your stoplist. In particular, do not include words that have
non-alphanumeric characters in them, such as apostrophes or dashes. These characters act as term breakers.
For example, the word you'll (which you specify as 'you'll') is broken into you and ll, and stored in the
stoplist as these two terms. Subsequent full-text searches for 'you' or 'they'll' are negatively impacted.
Stoplists in NGRAM TEXT indexes can cause unexpected results because the stoplist that is stored is actually in
n-gram form, not the actual stoplist terms you specified. For example, in an NGRAM TEXT index where MAXIMUM
TERM LENGTH is 3, if you specify STOPLIST 'there', these n-grams are stored as the stoplist: the her
ere. This impacts the ability to query for any terms that contain the n-grams the, her, and ere.
Procedure
1. Connect to the database as a user with CREATE TEXT CONFIGURATION or CREATE ANY TEXT
CONFIGURATION or CREATE ANY OBJECT system privilege
2. Execute a SELECT statement.
Example
Change the settings of the text configuration object, including the dbspace and permitted term lengths range.
Context
You can alter only text configuration objects that are not being used by a TEXT index.
Procedure
1. Connect to the database as a user with ALTER ANY TEXT CONFIGURATION or ALTER ANY OBJECT system
privilege, or as the owner of the text configuration object.
2. Execute an ALTER TEXT CONFIGURATION statement.
Example
To alter the minimum term length for the myTxtConfig text configuration object:
Modify the stoplist, which contains a list of terms to ignore when building a TEXT index with this text
configuration.
Context
You can alter only text configuration objects that are not being used by a TEXT index.
Procedure
1. Connect to the database as a user with ALTER ANY TEXT CONFIGURATION or ALTER ANY OBJECT system
privilege, or as the owner of the text configuration object.
Example
Context
Only text configurations that are not being used by a TEXT index can be dropped.
Procedure
1. Connect to the database as a user with DROP ANY TEXT CONFIGURATION or DROP ANY OBJECT system
privilege.
2. Execute a DROP TEXT CONFIGURATION statement.
Example
Review the samples to understand how text configuration settings impact the TEXT index, and how the index is
interpreted.
In this section:
All the examples use the string 'I'm not sure I understand'.
STOPLIST: ''
The parenthetical numbers in the Interpreted string column in the table "CONTAINS string interpretations"
reflect the position information stored for each term. The numbers are for illustration purposes in the
documentation. The actual stored terms do not include the parenthetical numbers.
Note
'wonderlandwonderlandwonderl ''
and*'
'"wonderlandwonderlandwonder '"wonderland(1)"'
land* wonderland"'
'"wonderlandwonderlandwonder '"weather(1)"'
land* weather"'
Allowed Values
0 – 300
Default
Scope
Requires the SET ANY PUBLIC OPTION system privilege to set this option. Can be set temporary, for an
individual connection, or for the PUBLIC role. Takes effect immediately.
Description
MAX_PREFIX_PER_CONTAINS_PHRASE specifies the threshold used to disallow more than one prefix in an
expression for a text search.
When this option is set to 0, any number is allowed. SAP IQ detects and reports an error, if the query has any
CONTAINS expressions with a phrase having more prefix terms than specified by this option.
Examples
SET MAX_PREFIX_PER_CONTAINS_PHRASE = 1
With the default MAX_PREFIX_PER_CONTAINS_PHRASE setting of 1, this CONTAINS clause returns a syntax
error:
When MAX_PREFIX_PER_CONTAINS_PHRASE is set equal to 0 (no limit) or 2, this CONTAINS clause is valid.
Learn about using external libraries to supply prefiltering and term breaking for documents.
In this section:
SAP IQ can use external pre-filter and term-breaker libraries written in C or C++ to prefilter and tokenize
the documents during index creation or query processing. These libraries can be dynamically loaded into the
process space of the database server.
Note
External pre-filter and term-breaker libraries are provided by an SAP-certified partner. Before using any
such pre-filters or term-breaker libraries, make sure the company is SAP-certified.
The external dynamically loadable pre-filter and term-breaker libraries are specified in the text configuration,
and need to be loaded by the database server. Each library contains an exported symbol that implements the
external function specified in the text configuration object. This function returns a set of function descriptors
that are used by the caller to perform the necessary tasks.
The external pre-filter and term-breaker libraries are loaded by the database server with the first CREATE TEXT
INDEX request, when a query for a given column is received that requires the library to be loaded, or when the
TEXT index needs to be updated.
The libraries are not loaded when an ALTER TEXT CONFIGURATION call is made, nor are they automatically
unloaded when a DROP TEXT CONFIGURATION call is made. The external pre-filter and term-breaker libraries
are not loaded if the server is started with the startup option to disallow the loading of external libraries.
Because these external C/C++ libraries involve the loading of non-server library code into the process space
of the server, there are potential risks to data integrity, data security, and server robustness from poorly
The ISYSTEXTCONFIG system view stores information about the external libraries associated with a text
configuration object.
In this section:
Related Information
Text configuration objects and TEXT indexes using external libraries have limitations by design.
• For TEXT indexes on binary columns, you must use external libraries provided by external vendors for
document conversion. SAP IQ does not implicitly convert documents stored in binary columns.
• N-gram based text searches are not supported if an external term-breaker is used to tokenize the
document.
All multiplex servers must have access to the pre-filter and term-breaker external libraries.
Users must ensure that each external pre-filter and term-breaker library is copied to the machine hosting a
multiplex server and placed in a location where the server is able to load the library.
Each multiplex server works independently of other servers when calling the external pre-filter and term-
breaker. Each process space can have the external libraries loaded and perform its own executions. It is
assumed that the implementation of the pre-filter and term-breaker functions is the same on each server and
will return the same result.
Unloading an external library from one server process space does not unload the library from other server
process spaces.
SAP IQ provides the -sf startup switch to enable or disable loading of external third-party libraries.
You can specify this switch in either the server startup command line or the server configuration file.
-sf -external_library_full_text
-sf external_library_full_text
To view a list of the libraries currently loaded in the server, use the sa_list_external_library stored
procedure.
Use the system procedure dbo.sa_external_library_unload to unload an external library when the
library is not in use.
call sa_external_library_unload('library.dll')
Learn about querying large object data, including the full text search capability that handles unstructured and
semistructured data.
In this section:
Performance Monitoring of LONG BINARY and LONG VARCHAR Columns [page 62]
The performance monitor displays performance data for LONG BINARY and LONG VARCHAR columns.
Full text search uses TEXT indexes to quickly find all instances of a term (word) in a database without having to
scan table rows.
TEXT indexes store positional information for terms in the indexed columns. Using a TEXT index to find rows
that contain a term is faster than scanning every row in the table.
Full text search uses the CONTAINS search condition, which differs from searching using predicates such as
LIKE, REGEXP, and SIMILAR TO, because the matching is term-based and not pattern-based.
String comparisons in full text search use all the normal collation settings for the database. For example, you
configure the database to be case-insensitive, then full text searches are also case-insensitive.
In this section:
Using full text search, you can search for terms, prefixes, or phrases (sequences of terms). You can also
combine multiple terms, phrases, or prefixes into Boolean expressions, or use proximity searches to require
that expressions appear near to each other.
Perform a full text search using a CONTAINS clause in either a WHERE clause, or a FROM clause of a SELECT
statement.
In this section:
When performing a full text search for a list of terms, the order of terms is not important unless they are within
a phrase.
If you put the terms within a phrase, the database server looks for those terms in exactly the same order, and
same relative positions, in which you specified them.
When performing a term or phrase search, if terms are dropped from the query because they exceed term
length settings or because they are in the stoplist, you can get back a different number of rows than you
expect. This is because removing the terms from the query is equivalent to changing your search criteria. For
You can search for the terms that are considered keywords of the CONTAINS clause grammar, as long as they
are within phrases.
Term Searching
In the sample database, a text index called MarketingTextIndex has been built on the Description column of
the MarketingInformation table. The following statement queries the MarketingInformation.Description column
and returns the rows where the value in the Description column contains the term cotton.
ID Description
The following example queries the MarketingInformation table and returns a single value for each row
indicating whether the value in the Description column contains the term cotton.
ID Results
901 0
902 0
903 0
904 0
905 0
906 1
907 0
908 1
909 1
910 1
ID Score Description
Phrase Searching
When performing a full text search for a phrase, you enclose the phrase in double quotes. A column matches if
it contains the terms in the specified order and relative positions.
You cannot specify CONTAINS keywords, such as AND or FUZZY, as terms to search for unless you place them
inside a phrase (single term phrases are allowed). For example, the statement below is acceptable even though
NOT is a CONTAINS keyword.
The following statement queries MarketingInformation.Description for the phrase "grown cotton", and
shows the score for each match:
ID Score Description
The full text search feature allows you to search for the beginning portion of a term, also known as a prefix
search.
To perform a prefix search, you specify the prefix you want to search for, followed by an asterisk. This is called a
prefix term.
Keywords for the CONTAINS clause cannot be used for prefix searching unless they are in a phrase.
You also can specify multiple prefix terms in a query string, including within phrases (for example, '"shi*
fab"').
The following example queries the MarketingInformation table for items that start with the prefix shi:
ID Score Description
ID 906 has the highest score because the term shield occurs less frequently than shirt in the text index.
• If a prefix term is longer than the MAXIMUM TERM LENGTH, it is dropped from the query string since
there can be no terms in the text index that exceed the MAXIMUM TERM LENGTH. So, on a text index with
MAXIMUM TERM LENGTH 3, searching for 'red appl*' is equivalent to searching for 'red'.
• If a prefix term is shorter than MINIMUM TERM LENGTH, and is not part of a phrase search, the prefix
search proceeds normally. So, on a GENERIC text index where MINIMUM TERM LENGTH is 5, searching for
'macintosh a*' returns indexed rows that contain macintosh and any terms of length 5 or greater that
start with a.
• If a prefix term is shorter than MINIMUM TERM LENGTH, but is part of a phrase search, the prefix term
is dropped from the query. So, on a GENERIC text index where MINIMUM TERM LENGTH is 5, searching
for '"macintosh appl* turnover"' is equivalent to searching for macintosh followed by any term
followed by turnover. A row containing "macintosh turnover" is not found; there must be a term
between macintosh and turnover.
On NGRAM text indexes, prefix searching can return unexpected results since an NGRAM text index contains
only n-grams, and contains no information about the beginning of terms. Query terms are also broken into
• If a prefix term is shorter than the n-gram length (MAXIMUM TERM LENGTH), the query returns all indexed
rows that contain n-grams starting with the prefix term. For example, on a 3-gram text index, searching for
'ea*' returns all indexed rows containing n-grams starting with ea. So, if the terms weather and fear were
indexed, the rows would be considered matches since their n-grams include eat and ear, respectively.
• If a prefix term is longer than n-gram length, and is not part of a phrase, and not an argument in a proximity
search, the prefix term is converted to an n-grammed phrase and the asterisk is dropped. For example, on
a 3-gram text index, searching for 'purple blac*' is equivalent to searching for '"pur urp rpl ple"
AND "bla lac"'.
• For phrases, the following behavior also takes place:
• If the prefix term is the only term in the phrase, it is converted to an n-grammed phrase and the
asterisk is dropped. For example, on a 3-gram text index, searching for '"purpl*"' is equivalent to
searching for '"pur urp rpl"'.
• If the prefix term is in the last position of the phrase, the asterisk is dropped and the terms are
converted to a phrase of n-grams. For example, on a 3-gram text index, searching for '"purple
blac*"' is equivalent to searching for '"pur urp rpl ple bla lac"'.
• If the prefix term is not in the last position of the phrase, the phrase is broken up into phrases that are
ANDed together. For example, on a 3-gram text index, searching for '"purp* blac*"' is equivalent
to searching for '"pur urp" AND "bla lac"'.
• If a prefix term is an argument in a proximity search, the proximity search is converted to an AND. For
example, on a 3-gram text index, searching for 'red NEAR[1] appl*' is equivalent to searching for 'red
AND "app ppl"'.
The full text search feature allows you to search for terms that are near each other in a single column, also
known as a proximity search.
The full text search feature allows you to search for terms that are near each other in a single column. This
is called a proximity search. To perform a proximity search, you specify two terms with either the keywords
NEAR or BEFORE between them, or the tilde (~).
You can use proximity-expression to search for terms that are near each other. For example, <b> NEAR[2,5]
<c> searches for instances of <b> and <c> that are at most 5 and at least 2 terms away from each other. The
order of terms is not significant; <b> NEAR <c>' is equivalent to<c> NEAR <b>. If NEAR is specified without
distance, a default of ten terms is applied. You can specify a tilde (~) instead of NEAR. This is equivalent to
specifying NEAR without a distance so a default of 10 terms is applied. NEAR expressions cannot be chained
together (for example, <a> NEAR[1] <b> NEAR[1] <c>).
BEFORE is like NEAR except that the order of terms is significant. <b> BEFORE <c> is not equivalent to <c>
BEFORE <b>; in the former, the term <b> must precede <c>, while in the latter, the term <b> must follow
<c>. BEFORE accepts both minimum and maximum distances (like NEAR). The default minimum distance is
1. The minimum distance, if given, must be less than or equal to the maximum distance; otherwise, an error is
returned.
If you do not specify a distance, the database server uses 10 as the default distance.
In a proximity search using an NGRAM text index, if you specify a prefix term as an argument, the proximity
search is converted to an AND expression. For example, on a 3-gram text index, searching for 'red NEAR[1]
appl*' is equivalent to searching for 'red AND "app ppl"'. Since this is no longer a proximity search,
the search is no longer restricted to a single column in the case where multiple columns are specified in the
CONTAINS clause.
Example
Suppose you want to search MarketingInformation.Description for the term fabric within 10 terms of the term
skin. You can execute the following statement.
ID Score Description
Since the default distance is 10 terms, you did not need to specify a distance. By extending the distance by one
term, however, another row is returned:
The score for ID 903 is higher because the terms are closer together.
You can specify multiple terms separated by Boolean operators such as AND, OR, and AND NOT when
performing full text searches.
The AND operator matches a row if it contains both of the terms specified on either side of the AND. You can
also use an ampersand (&) for the AND operator. If terms are specified without an operator between them,
AND is implied.
For example, each of the following statements finds rows in MarketingInformation.Description that contain the
term fabric and a term that begins with ski:
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'ski* AND fabric' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric & ski*' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'ski* fabric' );
The OR operator matches a row if it contains at least one of the specified search terms on either side of the OR.
You can also use a vertical bar (|) for the OR operator; the two are equivalent.
For example, either statement below returns rows in the MarketingInformation.Description that contain either
the term fabric or a term that starts with ski:
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'ski* OR fabric' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric | ski*' );
The AND NOT operator finds results that match the left argument and do not match the right argument. You
can also use a hyphen (-) for the AND NOT operator; the two are equivalent.
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric AND NOT ski*' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric -ski*' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric & -ski*' );
The boolean operators can be combined in a query string. For example, the following statements are equivalent
and search the MarketingInformation.Description column for items that contain fabric and skin, but not
cotton:
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'skin fabric -cotton' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric -cotton AND
skin' );
The following statements are equivalent and search the MarketingInformation.Description column for items
that contain fabric or both cotton and skin:
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'fabric | cotton AND
skin' );
SELECT *
FROM MarketingInformation
WHERE CONTAINS ( MarketingInformation.Description, 'cotton skin OR fabric' );
Terms and expressions can be grouped with parentheses. For example, the following statement searches the
MarketingInformation.Description column for items that contain cotton or fabric, and that have terms that
start with ski.
You can perform a full text search across multiple columns in a single query, as long as the columns are part of
the same text index.
SELECT *
FROM <t>
WHERE CONTAINS ( <t.c1>, <t.c2>, '<term1>|<term2>' );
SELECT *
FROM <t>
WHERE CONTAINS( <t.c1>, '<term1>' )
The first query matches if <t1.c1> contains <term1>, or if <t1.c2> contains <term2>.
The second query matches if either <t1.c1> or <t1.c2> contains either <term1> or <term2>. Using the
contains in this manner also returns scores for the matches.
To do so, use the FUZZY operator followed by a string in double quotes to find an approximate match for the
string. For example, CONTAINS ( Products.Description, 'FUZZY "cotton"' ) returns cotton and
misspellings such as coton or cotten.
Note
You can only perform fuzzy searches on text indexes built using the NGRAM term breaker.
Using the FUZZY operator is equivalent to breaking the string manually into substrings of length <n> and
separating them with OR operators. For example, suppose you have a text index configured with the NGRAM
term breaker and a MAXIMUM TERM LENGTH of 3. Specifying 'FUZZY "500 main street"' is equivalent
to specifying '500 OR mai OR ain OR str OR tre OR ree OR eet'.
The FUZZY operator is useful in a full text search that returns a score. This is because many approximate
matches may be returned, but usually only the matches with the highest scores are meaningful.
To use a full text search on a view or derived table, you must build a text index on the columns in the base table
that you want to perform a full text search on.
The following statements create a view on the MarketingInformation table in the sample database, which
already has a text index name, and then perform a full text search on that view.
To create a view on the MarketingInformation base table, execute the following statement:
Using the following statement, you can query the view using the text index on the underlying table.
SELECT *
FROM MarketingInfoView
WHERE CONTAINS ( "Desc", 'Cap OR Tee*' )
SELECT *
FROM (
SELECT MI.ProductID, MI."Description"
FROM MarketingInformation AS MI
WHERE MI."ID" > 4 ) AS dt ( P_ID, "Desc" )
WHERE CONTAINS ( "Desc", 'Base*' )
Note
The columns on which you want to run the full text search must be included in the SELECT list of the view
or derived table.
Searching a view using a text index on the underlying base table is restricted as follows:
• The view cannot contain a TOP, FIRST, DISTINCT, GROUP BY, ORDER BY, UNION, INTERSECT, EXCEPT
clause, or window function.
• The view cannot contain aggregate functions.
• A CONTAINS query can refer to a base table inside a view, but not to a base table inside a view that is inside
another view.
In this section:
Syntax
<table-expression> ::=
{ <table-spec> | <table-expression> <join-type> <table-spec>
[ ON <condition> ] | ( <table-expression> [, …] ) }
<table-spec> ::=
{ [ <userid>.] <table-name>
[ [ AS ] <correlation-name> ] | <select-statement>
[ AS <correlation-name> ( <column-name> [, …] ) ] }
<contains-expression> ::=
{<table-name> | <view-name> }
CONTAINS ( <column-name> [,...], <contains-query> )
Usage
<contains-expression> – use the CONTAINS clause after a table name to filter the table, and return only
those rows matching the full text query specified with <contains-query>.
Every matching row of the table is returned, along with a score column that can be referred to using <score-
correlation-name>, if it is specified. If <score-correlation-name> is not specified, then the score
column can be referred to by the default correlation name, <contains>.
With the exception of the optional correlation name argument, the CONTAINS clause takes the same
arguments as the CONTAINS search condition. There must be a TEXT index on the columns listed in the
CONTAINS clause.
Related Information
Perform a full text query using the CONTAINS clause in the FROM clause of a SELECT statement, or by using the
CONTAINS search condition (predicate) in a WHERE clause.
Both methods return the same rows; however, the CONTAINS clause also returns scores for the matching rows.
Syntax
<contains-query-string> ::=
<simple-expression> | <or-expression>
<simple-expression> ::=
<primary-expression> | <and-expression>
<or-expression> ::=
<simple-expression> { OR | | } <contains-query-string>
<primary-expression> ::=
basic-expression
| FUZZY " <fuzzy-expression> "
| <and-not-expression>
<and-expression> ::=
<and-not-expression> ::=
<primary-expression> [ AND | & ]
{ NOT | - } <basic-expression>
<basic-expression> ::=
term
| phrase
| ( <contains-query-string> )
| <proximity-expression>
<fuzzy-expression> ::=
<term> | fuzzy-expression term
<term> ::=
simple-term | <prefix-term>
<prefix-term> ::=
simple-term*
phrase ::=
" <phrase-string> "
<proximity-expression> ::=
<term> ( BEFORE | NEAR )
[ <minimum distance>, | <maximum distance> ] <term> | <term>
{BEFORE | NEAR | ~ } <term>
<phrase-string> ::=
<term> | phrase-string term
Parameters
simple-term
A string separated by white space and special characters that represents a single indexed term (word) for
which to search.
distance
A positive integer.
and-expression
To specify that both <primary-expression> and <simple-expression> must be found in the TEXT
index. By default, if no operator is specified between terms or expressions, an and-expression is assumed.
For example, 'a b'' is interpreted as 'a AND b'. An ampersand (&) can be used instead of AND, and can
abut the expressions or terms on either side (for example, 'a & b').
and-not-expression
To specify that <primary-expression> must be present in the TEXT index, but that <basic-
expression> must not be found in the TEXT index. This is also known as a negation. When you use a
hyphen for negation, a space must precede the hyphen, and the hyphen must abut the subsequent term.
To find terms that are similar to what you specify. Fuzzy matching is only supported on NGRAM TEXT
indexes.
proximity-expression
To search for terms that are near each other. For example, 'b NEAR[2,5] c' searches for instances of b
and c that are at most five and at least 2 terms away from each other. The order of terms is not significant;
'b NEAR c' is equivalent to 'c NEAR b'. If NEAR is specified without <distance>, a default of 10 terms
is applied. You can specify a tilde (~) instead of NEAR. This is equivalent to specifying NEAR without a
distance so a default of 10 terms is applied. NEAR expressions cannot be chained together (for example, 'a
NEAR[1] b NEAR[1] c').
BEFORE is like NEAR, except that the order of terms is significant. 'b BEFORE c' is not equivalent to 'c
BEFORE b'; in the former, the term 'b' must precede 'c' while in the latter the term 'b' must follow
'c'. BEFORE accepts both minimum and maximum distances like NEAR. The default minimum distance is
1. The minimum distance, if given, must be less than or equal to the maximum distance; otherwise, an error
is returned.
prefix-term
To search for terms that start with the specified prefix. For example, 'datab*' searches for any term
beginning with <datab>. This is also known as a prefix search. In a prefix search, matching is performed
for the portion of the term to the left of the asterisk.
Usage
The CONTAINS search condition takes a column list and <contains-query-string> as arguments.
The CONTAINS search condition can be used anywhere a search condition (also referred to as predicate)
can be specified, and returns TRUE or FALSE. <contains-query-string> must be a constant string, or a
variable, with a value that is known at query time.
If multiple columns are specified, then they must all refer to a single base table; a TEXT index cannot span
multiple base tables. You can reference the base directly in the FROM clause, or use it in a view or derived table,
provided that the view or derived table does not use DISTINCT, GROUP BY, ORDER BY, UNION, INTERSECT,
EXCEPT, or a row limitation.
Queries using ANSI join syntax are supported (FULL OUTER JOIN, RIGHT OUTER JOIN, LEFT OUTER
JOIN), but may have suboptimal performance. Use outer joins for CONTAINS in the FROM clause only if the
score column from each of the CONTAINS clauses is required. Otherwise, move CONTAINS to an ON condition
or WHERE clause.
• Remote queries using an SAP SQL Anywhere table with a full TEXT index that is joined to a remote table.
If you use a SQL variable less than 32KB in length as a search term and the type of variable is LONG VARCHAR,
use CAST to convert the variable to VARCHAR data type. For example:
The following warnings apply to the use of non-alphanumeric characters in query strings:
Within phrases, the asterisk is the only special character that continues to be interpreted as a special
character. All other special characters within phrases are treated as white space and serve as term breakers.
1. Interpretation of operators and precedence: During this step, keywords are interpreted as operators, and
rules of precedence are applied.
2. Application of text configuration object settings: During this step, the text configuration object settings are
applied to terms. Any query terms that exceed the term length settings, or that are in the stop list, are
dropped.
In this section:
Related Information
1. FUZZY, NEAR
2. AND NOT
3. AND
4. OR
An asterisk can occur at the end of the query string, or be followed by a space, ampersand, vertical bar, closing
bracket, or closing quotation mark. Any other usage of asterisk returns an error.
'th*&best' 'th* AND best' and Find any term beginning with th, and the term best.
'th* best'
'th*|best' 'th* OR best' Find either any term beginning with th, or the term best.
'very&(best| 'very AND (best Find the term very, and the term best or any term beginning with th.
th*)' OR th*)'
'"fast Find the term fast, immediately followed by a term beginning with auto.
auto*"'
'"auto* Find a term beginning with auto, immediately followed by the term price.
price"'
Note
Interpretation of query strings containing asterisks varies depending on the text configuration object
settings.
The hyphen can be used in a query for term or expression negation, and is equivalent to NOT.
Whether a hyphen is interpreted as a negation depends on its location in the query string. For example, when a
hyphen immediately precedes a term or expression, it is interpreted as a negation. If the hyphen is embedded
within a term, it is interpreted as a hyphen.
A hyphen used for negation must be preceded by a white space, and followed immediately by an expression.
When used in a phrase of a fuzzy expression, the hyphen is treated as white space and used as a term breaker.
'the -best' 'the AND NOT best', 'the AND Find the term the, and not the term best.
-best', 'the & -best', 'the NOT
best'
'the -(very 'the AND NOT (very AND best)' Find the term the, and not the terms very and
best)' best.
'the -"very 'the AND NOT "very best"' Find the term the, and not the phrase very best.
best"'
'alpha- '"alpha numerics"' Find the term alpha, immediately followed by the
numerics' term numerics.
'wild - 'wild west', and 'wild AND west' Find the term wild, and the term west.
west'
The table "Special character interpretations" shows the allowed syntax for all special characters, except
asterisk and hyphen.
The asterisk and hyphen characters are not considered special characters, if they are found in a phrase, and
are dropped.
Note
The restrictions on specifying string literals also apply to the query string. For example, apostrophes must
be within an escape sequence.
Character or
Syntax Usage Examples and Remarks
ampersand (&)
The ampersand is equivalent to AND, and can be specified as follows:
• 'a| b'
• 'a |b'
• 'a | b'
• 'a|b'
double quotes Double quotes are used to contain a sequence of terms where order and relative distance are important.
(") For example, in the query string 'learn "full text search"', "full text search" is a phrase. In
this example, learn can come before or after the phrase, or exist in another column (if the TEXT index is
built on more than one column), but the exact phrase must be found in a single column.
parentheses () Parentheses are used to specify the order of evaluation of expressions, if different from the default order.
For example, 'a AND (b|c)' is interpreted as a, and b or c.
tilde (~) The tilde is equivalent to NEAR, and has no special syntax rules. The query string 'full~text' is
equivalent to 'full NEAR text', and is interpreted as: the term full within ten terms of the term
text.
square brackets Square brackets are used in conjunction with the keyword NEAR to contain <distance>. Other uses of
[] square brackets return an error.
TEXT indexes are built according to the settings defined for the text configuration object used to create the
TEXT index. ATEXT index excludes terms that meet any of the following conditions:
The same rules apply to query strings. The dropped term can match zero or more terms at the end or
beginning of the phrase. For example, suppose the term 'the' is in the stop list:
• If the term appears on either side of an AND, OR, or NEAR, then both the operator and the term are removed.
For example, searching for 'the AND apple', 'the OR apple', or 'the NEAR apple' are equivalent
to searching for 'apple'.
• If the term appears on the right side of an AND NOT, both the AND NOT and the term are dropped. For
example, searching for 'apple AND NOT the' is equivalent to searching for 'apple'.
Note
If all of the terms for which you are searching are dropped, SAP IQ returns the error CONTAINS has NULL
search term. SAP SQL Anywhere reports no error and returns zero rows.
You can sort query results using the score that indicates the closeness of a match.
When you include a CONTAINS clause in the FROM clause of a query, each match has a score associated with
it. The score indicates how close the match is, and you can use score information to sort the data. Two main
criteria determine score:
• The number of times a term appears in the indexed row. The more times a term appears in an indexed row,
the higher its score.
• The number of times a term appears in the TEXT index. The more times a term appears in a TEXT index,
the lower its score.
Depending on the type of full text search, other criteria affect scoring. For example, in proximity searches, the
proximity of search terms impacts scoring. By default, the result set of a CONTAINS clause has the correlation
name contains that has a single column in it called score. You can refer to "contains".score in the
SELECT list, ORDER BY clause, or other parts of the query. However, because contains is a SQL reserved
word, you must remember to put it in double quotes. Alternatively, you can specify another correlation name,
for example, CONTAINS ( expression ) AS ct. The examples for full text search refer to the score
column as ct.score.
This statement searches MarketingInformation.Description for terms starting with "stretch" or terms
starting with "comfort":
Fuzzy and non-fuzzy search capability over a TEXT index is possible for TEXT indexes of type NGRAM.
In this section:
Fuzzy search capability over a TEXT index is possible only if the TEXT index is of type NGRAM. The GENERIC
TEXT index cannot handle the fuzzy search.
Fuzzy searching can be used to search for misspellings or variations of a word. To do so, use the FUZZY
operator followed by a string in double quotes to find an approximate match for the string.
Using the FUZZY operator is equivalent to breaking the string manually into substrings of length <n> and
separating them with OR operators. For example, if you have a text index configured with the NGRAM term
breaker and a MAXIMUM TERM LENGTH of 3, specifying 'FUZZY "500 main street" ' is equivalent to
specifying '500 OR mai OR ain OR str OR tre OR ree OR eet'.
The FUZZY operator is useful in a full text search that returns a score. Many approximate matches may be
returned, but usually only the matches with the highest scores are meaningful.
Note
Fuzzy search does not support prefix or suffix searching. For example, the search clause cannot be “v*” or
“*vis”.
After inserting the data, execute this query to perform fuzzy searching over an NGRAM TEXT index:
a b
2 book he ookw worm okwo kwor
5 he is a bookworm
6 boo ook okw kwo wor orm
a b
1 hello this is hira
4 hello this is evaa
a b
1 hello this is hira
4 hello this is evaa
Non-fuzzy search on NGRAM breaks the term into corresponding n-grams and searches for the n-grams in the
NGRAM TEXT index.
The query CONTAINS ( M.Description, 'ams' ) ct; illustrates a non-fuzzy NGRAM search over a
2GRAM index, which is semantically equal to searching query CONTAINS( M.Description, '"am ms"' )
ct;
If you search for a ‘v*’ TERM on a 2GRAM index, then v followed by any alphabet is considered as a matching
2GRAM for the searching term and is output as a result.
The query CONTAINS (M.Description, 'white whale') ct; illustrates a non-fuzzy NGRAM search over a
3GRAM index and is semantically equal to searching query CONTAINS (M.Description, '"whi hit ite
wha hal ale"');
The difference between NGRAM fuzzy and non-fuzzy search is that fuzzy search is a disjunction over individual
GRAMS. Non-fuzzy search is a conjunction over the individual GRAMS. When GENERIC and NGRAM TEXT
indexes are created on the same column, then the GENERIC TEXT index is used for a query with non-fuzzy
search and the NGRAM TEXT index is used for fuzzy search.
Example: Non-Fuzzy Search After Creating a GENERIC TEXT Index on the Same
Column
This query illustrates non-fuzzy search after creating a GENERIC TEXT index on the same column:
a b
5 he is a bookworm
Example: Fuzzy Search With Both NGRAM and GENERIC TEXT Indexes on the Same
Column
This query illustrates fuzzy search with both NGRAM and GENERIC TEXT indexes on the same column:
a b
2 book he ookw worm okwo kwor
5 he is a bookworm
6 boo ook okw kwo wor orm
This query illustrates the behavior of a fuzzy search phrase in a non-fuzzy search clause:
In WHERE clauses of the SELECT statement, you can use LONG BINARY columns only in IS NULL and IS
NOT NULL expressions, in addition to the BYTE_LENGTH64, BYTE_SUBSTR64, BYTE_SUBSTR, BIT_LENGTH,
OCTET_LENGTH, CHARINDEX, and LOCATE functions.
You cannot use LONG BINARY columns in the SELECT statement clauses ORDER BY, GROUP BY, and HAVING
or with the DISTINCT keyword.
SAP IQ does not support LIKE predicates on LONG BINARY (BLOB) columns or variables. Searching for
a pattern in a LONG BINARY column using a LIKE predicate returns the error Invalid data type
comparison in predicate.
Related Information
In WHERE clauses of the SELECT statement, you can use LONG VARCHAR columns only in IS NULL and IS NOT
NULL expressions, in addition to the BIT_LENGTH, CHAR_LENGTH, CHAR_LENGTH64, CHARINDEX, LOCATE,
OCTET_LENGTH, PATINDEX, SUBSTRING64, and SUBSTRING functions.
You can use the LIKE predicate to search for a pattern on a LONG VARCHAR column. All patterns of 126
or fewer characters are supported. Patterns longer than 254 characters are not supported. Some patterns
between 127 and 254 characters in length are supported, depending on the contents of the pattern.
The LIKE predicate supports LONG VARCHAR (CLOB) variables of any size of data. Currently, a SQL variable
can hold up to 2 GB - 1 in length.
You cannot use LONG VARCHAR columns in the SELECT statement clauses ORDER BY, GROUP BY, and HAVING
or with the DISTINCT keyword (SELECT DISTINCT and COUNT DISTINCT).
In this section:
Related Information
You can create a WORD (WD) index on a LONG VARCHAR (CLOB) column and use the CONTAINS predicate to
search the column for string constants of maximum length 255 characters.
The CONTAINS predicate is not supported on LONG BINARY (BLOB) columns using WD indexes. If you attempt
to search for a string in a LONG BINARY column with a WD index using a CONTAINS predicate, an error is
returned. TEXT indexes that use an external library support CONTAINS on binary data.
The performance monitor displays performance data for LONG BINARY and LONG VARCHAR columns.
Learn about stored procedure support for the LONG BINARY (BLOB) and LONG VARCHAR (CLOB) data type
columns and full text searching.
In this section:
You can use stored procedures to break strings into terms, to discover how many terms are in the TEXT index
and their position, and to display statistical information about the TEXT indexes.
Related Information
The sa_list_external_library stored procedure lists the libraries that are currently loaded in the server.
Once identified, use sa_external_library_unload to unload the library from the server.
Related Information
The sp_iqsetcompression stored procedure controls the compression of large object columns.
sp_iqsetcompression controls the compression of columns of data type LONG BINARY and LONG VARCHAR
when writing database buffers to disk. You can also use sp_iqsetcompression to disable compression. This
functionality saves CPU cycles, because certain data formats stored in a LONG BINARY or LONG VARCHAR
column (for example, JPG files) are already compressed and gain nothing from additional compression.
The sp_iqshowcompression stored procedure displays the compression setting of large object columns.
• sp_iqsetcompression Procedure
• sp_iqshowcompression Procedure
Related Information
sp_iqsetcompression Procedure
sp_iqshowcompression Procedure
The stored procedure sp_iqindexsize displays the size of an individual LONG BINARY and LONG VARCHAR
column.
In this section:
sp_iqindexsize output that shows a LONG BINARY column with approximately 42 GB of data.
The page size is 128 KB. The largelob Info type is in the last row.
DBA test10.DBA.ASIQ_IDX_T128_C3_FP FP bt 0 0 0
sp_iqindexsize output that shows a LONG VARCHAR column with approximately 42 GB of data.
The page size is 128 KB. The largelob Info type is in the last row.
DBA test10.DBA.ASIQ_IDX_T128_C3_FP FP bt 0 0 0
In this section:
The SAP IQ data extraction facility includes the BFILE function, which allows you to extract individual LONG
BINARY and LONG VARCHAR cells to individual operating system files on the server.
You can use BFILE with or without the data extraction facility.
Related Information
Load LONG BINARY and LONG VARCHAR data using the extended syntax of the LOAD TABLE statement.
You can load large object data of unlimited size, unless restricted by the operating system, from a primary file
in ASCII or BCP format. The maximum length of fixed-width data loaded from a primary file into large object
columns is 32 K - 1.
You can also specify a secondary load file in the primary load file. Each individual secondary data file contains
exactly one LONG BINARY or LONG VARCHAR cell value.
In this section:
LOAD TABLE has extended syntax for loading large object data.
Syntax
<load-column-specification>:
...
| { BINARY | ASCII } FILE( <integer> )
| { BINARY | ASCII } FILE ( '<string>' )
The keywords BINARY FILE (for LONG BINARY) or ASCII FILE (for LONG VARCHAR) specify to the load that
the primary input file for the column contains the path of the secondary file (which contains the LONG BINARY
or LONG VARCHAR cell value), rather than the LONG BINARY or LONG VARCHAR data itself. The secondary
file pathname can be either fully qualified or relative. If the secondary file pathname is not fully qualified, then
the path is relative to the directory in which the server was started. Tape devices are not supported for the
secondary file.
SAP IQ supports loading LONG BINARY and LONG VARCHAR values of unlimited length (subject to operating
system restrictions) in the primary load file. When binary data of hexadecimal format is loaded into a LONG
BINARY column from a primary file, SAP IQ requires that the total number of hexadecimal digits is an even
SAP IQ does not support loading large object columns from primary files using LOAD TABLE…FORMAT
BINARY. You can load large object data in binary format from secondary files.
For LOAD TABLE FORMAT BCP, the load specification may contain only column names, NULL, and
ENCRYPTED. This means that you cannot use secondary files when loading LONG BINARY and LONG VARCHAR
columns using the LOAD TABLE FORMAT BCP option.
1,boston,jpg,/s1/loads/lobs/boston.jpg,
2,map_of_concord,bmp,/s1/loads/maprs/concord.bmp,
3,zero length test,NULL,,
4,null test,NULL,NULL,
After the LONG BINARY data is loaded into table tab, the first and second rows for column lobcol contain the
contents of files boston.jpg and concord.bmp, respectively. The third and fourth rows contain a zero-length
value and NULL, respectively.
The database option SECONDARY_FILE_ERROR allows you to specify the action of the load if an error occurs
while opening or reading from a secondary BINARY FILE or ASCII FILE.
If SECONDARY_FILE_ERROR is ON, the load rolls back if an error occurs while opening or reading from a
secondary BINARY FILE or ASCII FILE.
If SECONDARY_FILE_ERROR is OFF (the default), the load continues, regardless of any errors that occur while
opening or reading from a secondary BINARY FILE or ASCII FILE. The LONG BINARY or LONG VARCHAR
cell is left with one of these values:
Any user can set SECONDARY_FILE_ERROR for the PUBLIC role or temporary; the option setting takes effect
immediately.
When logging integrity constraint violations to the load error ROW LOG file, the information logged for a LONG
BINARY or LONG VARCHAR column is:
• Actual text as read from the primary data file, if the logging occurs within the first pass of the load
operation
• Zero-length value, if the logging occurs within the second pass of the load operation
Trailing blanks are not stripped from LONG VARCHAR data, even if the STRIP option is on.
The LOAD TABLE...QUOTES option does not apply to loading LONG BINARY (BLOB) or LONG VARCHAR
(CLOB) data from the secondary file, regardless of its setting,
A leading or trailing quote is loaded as part of CLOB data. Two consecutive quotes between enclosing quotes
are loaded as two consecutive quotes with the QUOTES ON option.
Partial multibyte LONG VARCHAR data is truncated during the load according to the value of the
TRIM_PARTIAL_MBC database option.
• If TRIM_PARTIAL_MBC is ON, a partial multibyte character is truncated for both primary data and the LOAD
with ASCII FILE option.
• If TRIM_PARTIAL_MBC is OFF, the LOAD with ASCII FILE option handles the partial multibyte character
according to the value of the SECONDARY_FILE_ERROR database option.
The following table lists how a trailing multibyte character is loaded, depending on the values of
TRIM_PARTIAL_MBC and SECONDARY_FILE_ERROR:
Large object variables are supported by the LOAD TABLE, INSERT…VALUES, INSERT…SELECT, INSERT…
LOCATION, SELECT…INTO, and UPDATE SQL statements.
Related Information
Learn about the characteristics of the large object LONG BINARY and LONG VARCHAR data type columns and
index support of large object data.
In this section:
Large Object Data Types LONG BINARY and BLOB [page 72]
Binary large object (BLOB) data in SAP IQ is stored in columns of data type LONG BINARY or BLOB.
Large Object Data Types LONG VARCHAR and CLOB [page 73]
Character large object (CLOB) data in SAP IQ is stored in columns of data type LONG VARCHAR or
CLOB.
Binary large object (BLOB) data in SAP IQ is stored in columns of data type LONG BINARY or BLOB.
An individual LONG BINARY data value can have a length ranging from zero (0) to 2 PB (petabytes) for an IQ
page size of 512 KB. (The maximum length is equal to 4 GB multiplied by the database page size.)
A table or database can contain any number of LONG BINARY columns up to the supported maximum columns
per table and maximum columns per database, respectively.
LONG BINARY columns can be either NULL or NOT NULL and can store zero-length values. The domain BLOB
is a LONG BINARY data type that allows NULL.
Modify LONG BINARY columns using the UPDATE, INSERT, LOAD TABLE, DELETE, TRUNCATE,
SELECT...INTO and INSERT...LOCATION SQL statements. Positioned updates and deletes are not
supported on LONG BINARY columns.
You can insert an SAP Adaptive Server Enterprise IMAGE column into a LONG BINARY column using the
INSERT...LOCATION command. All IMAGE data inserted is silently right-truncated at 2147483648 bytes (2
GB).
In this section:
There are limited implicit data type conversions to and from the LONG BINARY data type and non-LONG
BINARY data types.
There are no implicit data type conversions from the LONG BINARY data type to another non-LONG BINARY
data type, except to the BINARY and VARBINARY data types for INSERT and UPDATE. There are implicit
conversions to LONG BINARY data type from TINYINT, SMALLINT, INTEGER, UNSIGNED INTEGER, BIGINT,
UNSIGNED BIGINT, CHAR, and VARCHAR data types. There are no implicit conversions from BIT, REAL,
DOUBLE, or NUMERIC data types to LONG BINARY data type. Implicit conversion can be controlled using the
CONVERSION_MODE database option.
The currently supported byte substring functions for the LONG BINARY data type are accepted as input for
implicit conversion for the INSERT and UPDATE statements.
The LONG BINARY data type can be explicitly converted to BINARY or VARBINARY. No other explicit data type
conversions (for example, using the CAST or CONVERT function) exist either to or from the LONG BINARY data
type.
Truncation of LONG BINARY data during conversion of LONG BINARY to BINARY or VARBINARY is handled
the same way the truncation of BINARY and VARBINARY data is handled. If the STRING_RTRUNCATION option
is ON, any right-truncation (of any values, not just non-space characters) on INSERT or UPDATE of a binary
column results in a truncation error and the transaction is rolled back.
Related Information
Character large object (CLOB) data in SAP IQ is stored in columns of data type LONG VARCHAR or CLOB.
An individual LONG VARCHAR data value can have a length ranging from zero (0) to 512 TB (terabytes) for an IQ
page size of 128 KB, or 2 PB (petabytes) for an IQ page size of 512 KB (the maximum length is equal to 4 GB
multiplied by the database page size.) To accommodate a table with LONG VARCHAR data, an IQ database must
be created with an IQ page size of at least 64 KB (65536 bytes).
A table or database can contain any number of LONG VARCHAR columns up to the supported maximum
columns per table and maximum columns per database, respectively.
You can create a LONG VARCHAR column using the domain CLOB, when you create a table or add a column to
an existing table. For example:
You can create a WORD (WD) index on a LONG VARCHAR column. You cannot construct other non-FP index types
and join indexes on a LONG VARCHAR column.
You can modify a LONG VARCHAR column using the UPDATE, INSERT...VALUES, INSERT...SELECT, LOAD
TABLE, DELETE, TRUNCATE, SELECT...INTO and INSERT...LOCATION SQL statements. Positioned updates
and deletes are not supported on LONG VARCHAR columns.
You can insert an SAP Adaptive Server Enterprise TEXT column into a LONG VARCHAR column using the
INSERT...LOCATION command. All TEXT data inserted is silently right-truncated at 2147483648 bytes (2
GB).
In this section:
There are limited implicit data type conversions to and from the LONG VARCHAR data type and non-LONG
VARCHAR data types.
There are no implicit data type conversions from the LONG VARCHAR data type to another non-LONG
VARCHAR data type, except LONG BINARY, and CHAR and VARCHAR for INSERT and UPDATE only. There
are implicit conversions to LONG VARCHAR data type from CHAR and VARCHAR data types. There are no
implicit conversions from BIT, REAL, DOUBLE, NUMERIC, TINYINT, SMALLINT, INT, UNSIGNED INT, BIGINT,
UNSIGNED BIGINT, BINARY, VARBINARY, or LONG BINARY data types to LONG VARCHAR data type. Implicit
conversion can be controlled using the CONVERSION_MODE database option.
The currently supported string functions for the LONG VARCHAR data type are accepted as input for implicit
conversion for the INSERT and UPDATE statements.
The LONG VARCHAR data type can be explicitly converted to CHAR and VARCHAR. No other explicit data type
conversions (for example, using the CAST or CONVERT function) exist either to or from the LONG VARCHAR data
type.
Truncation of LONG VARCHAR data during conversion of LONG VARCHAR to CHAR is handled the same way the
truncation of CHAR data is handled. If the STRING_RTRUNCATION option is ON and string right-truncation of
non-spaces occurs, a truncation error is reported and the transaction is rolled back. Trailing partial multibyte
characters are replaced with spaces on conversion.
Related Information
Inbound LONG BINARY and LONG VARCHAR variables (host variables or SQL variables used by IQ) have no
maximum length.
Outbound LONG BINARY and LONG VARCHAR variables (variables set by IQ) have a maximum length of 2 GB -
1.
The LOAD TABLE, INSERT…VALUES, INSERT…SELECT, INSERT…LOCATION, SELECT…INTO, and UPDATE SQL
statements accept LONG BINARY and LONG VARCHAR variables of any size of data. Currently, a SQL variable
can hold up to 2 GB - 1 in length.
In this section:
The database option ENABLE_LOB_VARIABLES controls the data type conversion of large object variables.
ENABLE_LOB_VARIABLES Option
SAP IQ supports the TEXT index on LONG BINARY and LONG VARCHAR columns, and the WORD (WD) index on
LONG VARCHAR columns.
In this section:
The TEXT index supports LONG BINARY and LONG VARCHAR columns.
Related Information
SAP IQ offers limited support for the WORD (WD) index on LONG VARCHAR (CLOB) columns.
• The widest column supported by the WD index is the maximum width for a LOB column (the maximum
length is equal to 4 GB multiplied by the database page size). The maximum word length supported by SAP
IQ is 255 bytes.
• All sp_iqcheckdb options for WD indexes over CHAR and VARCHAR columns are also supported for LONG
VARCHAR (CLOB) columns, including allocation, check, and verify modes.
• You can use sp_iqrebuildindex to rebuild a WD index over a LONG VARCHAR (CLOB) column.
Chinese text or documents in a binary format require ETL preprocessing to locate and transform words into a
form that can be parsed by the WD index.
Learn about the SQL statements and syntax that support working with TEXT indexes and text configurations.
Related Information
Learn about the functions that support the LONG BINARY and LONG VARCHAR data types.
In addition to the functions listed in this table, you can use the BFILE function to extract LOB data.
Scalar and aggregate user-defined functions support large object data types as input parameters.
BLOB Data Sup- BLOB Variables Sup- CLOB Data Sup- CLOB Variables Sup-
Function ported? ported? ported? ported?
*The BYTE_LENGTH function supports both LONG BINARY columns and variables and LONG VARCHAR
columns and variables, only if the query returns less than 2GB. If the byte length of the returned LONG BINARY
or LONG VARCHAR data is greater than 2 GB, BYTE_LENGTH returns an error that says you must use the
BYTE_LENGTH64 function.
For full descriptions of these functions and examples, see the SQL Functions section in SAP IQ SQL Reference.
In this section:
Only the aggregate function COUNT (*) is supported for LONG BINARY and LONG VARCHAR columns.
The COUNT DISTINCT parameter is not supported. An error is returned if a LONG BINARY or LONG VARCHAR
column is used with the MIN, MAX, AVG, or SUM aggregate functions.
Scalar and aggregate user-defined functions support large object data types LONG VARCHAR (CLOB) and LONG
BINARY (BLOB) up to 4 GB (gigabytes) as input parameters. LOB data types are not supported as output
parameters.
You must be specifically licensed to use the User-defined Function support functionality.
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.