Professional Documents
Culture Documents
External Tables: - Not Just Loading A CSV File Kim Berg Hansen Senior Consultant
External Tables: - Not Just Loading A CSV File Kim Berg Hansen Senior Consultant
External Tables: - Not Just Loading A CSV File Kim Berg Hansen Senior Consultant
• Danish geek
• SQL & PL/SQL developer since 2000
• Developer at Trivadis since 2016
http://www.trivadis.dk
• Oracle Certified Expert in SQL
• Oracle ACE Director
• Blogger at http://www.kibeha.dk
• SQL quizmaster at
http://devgym.oracle.com
• Likes to cook
• Reads sci-fi
• Member of Danish Beer Enthusiasts
OPERATION
Definition created in data dictionary* like normal table (only data is outside DB)
(* in 18c not necessarily - more on that later)
Specify type (access driver), directory and location (file)
Specify access parameters depending on access driver
create table ext_tab (fk number, col2 varchar2(10))
organization external (
type oracle_loader
access parameters (
records delimited by newline
fields terminated by ";" optionally enclosed by '"'
( fk integer external(6), col2 char(10) )
)
location (ext_dir:'file.txt')
);
18c doc states opaque_format_spec in quotes used for INLINE EXTERNAL and
EXTERNAL_MODIFY, while without quotes is used for CREATE TABLE
– This appears to be a doc bug - without quotes seems always to work
Or a subquery can return the access parameters
Nothing in data dictionary (hence also less information for the optimizer)
CHARACTERSET
– What characterset is the file (default is DB characterset, not client)
LANGUAGE
– Which language is used for month names, AM/PM, etc. in the file
TERRITORY
– How are decimal / thousand separators, week numbers, etc. in the file
DATA IS BIG ENDIAN / DATA IS LITTLE ENDIAN
– What endianness used the platform where the file originated
FIXED
– Each record a fixed length (in bytes)
VARIABLE
– Start of each record contains a character count
DELIMITED BY
– Each record ends with a given string
XMLTAG
– Each record is the content within a given XML tag: <MYTAG>....</MYTAG>
Field list for file not necessarily match directly field list for table, can map differently
ALL FIELDS OVERRIDE - tells that field list does match directly table fields
– Then only list fields that needs extra info, like non-default date format or such
FIELD NAMES clause tells how to handle that first line contains field names
– Can be ignored or can map fields automatically by field name
TERMINATED BY / [OPTIONALLY] ENCLOSED BY
FIELDS CSV
– WITH / WITHOUT EMBEDDED - does file contain record delim within string fields
– TERMINATED / ENCLOSED - override default , and "
Start position
– Digit is position directly
– * means the start is the char after the end of previous field
– *+{offset} or *-{offset} means plus or minus offset chars after end of previous field
End can be specified as position (Digit) or as length (+Digit)
STRING SIZES ARE IN
– Parameter says if positions are measured in bytes or chars (for multibyte charsets)
PREPROCESSOR [{directory}:]{script_or_exe_file}
Must have EXECUTE privilege on directory object
Can be different directory than the datafile - this is recommended for security
Preprocessor script/exe will be called with filename from LOCATION as parameter
Standard output from script/exe will become the input for the EXTERNAL TABLE
Cannot specify arguments directly
– if executable requires arguments, must wrap it in a script
Windows script (batch file) must have suffix .bat or .cmd
Windows batch file must start with @echo off
Multiple files
– Each file specified in LOCATION handled by each slave process
- parallel degree not helpful to set larger than number of files
– That includes that PREPROCESSOR is called for each file by slave process
Large files
– ORACLE_LOADER parallel select can attempt to assign file chunks to slaves
– Cannot always be done, for example not by:
- Named pipes as input
- Multibyte charactersets (unless fixed byte length records)
- Variable length records with length indicator bytes
COMPRESSION
– ENABLED BASIC / LOW / MEDIUM / HIGH
- requires Advanced Compression option
ENCRYPTION
– ENABLED / DISABLED
VERSION
– COMPATIBLE / LATEST / version number
Create external table on an existing Dump File (for example from other DB)
External HDFS / HIVE tables for Oracle Big Data SQL (licensed product)
– Hadoop Clusters on Oracle Big Data Appliance
– Database on Exadata
HIVE metadata exposed to database
– ORACLE_HIVE external tables can just specify columns and HIVE cluster/table
– Can override mappings if desired
ORACLE_HDFS you specify HIVE style metadata directly, no table in HIVE catalog