Professional Documents
Culture Documents
DT-EDU-DEN80EDU18S04 Cache Configuration
DT-EDU-DEN80EDU18S04 Cache Configuration
DT-EDU-DEN80EDU18S04 Cache Configuration
Cache Configuration
Denodo Platform 8.0
Denodo
Operations
Management
3. Cache Maintenance
4. Cache Best Practices
The Cache module can store copies of the source data in a relational database
accessible through JDBC.
■ This module automatically maintains a table in the cache database for each
base view or view for which the cache has been configured.
■ From an administration point of view, the cache server can be configured
globally at server level or per virtual database.
For huge data loads in the amount of millions of rows a Bulk Data Load API is provided
by different DBMS. The difference between Bulk Data Load and a “normal” load is:
■ It does not run single/multiple INSERT statements to load data.
■ It is optimized for huge data loads, thus only when dealing with large amounts
of data significant improvements are reached.
If the DBMS used as cache system allows administrators to use the Bulk Data Load
feature:
■ Virtual DataPort Server writes data to an external file with a specific format.
■ Not all DBMS create these external files.
■ These files are transferred to the DBMS.
Depending on the DBMS selected as cache system, some parameters can be configured
when enabling this feature. Typical configuration includes:
■ Work Path:
■ This field allows administrators to select a different directory to create the temporary files,
instead of using the path where the Denodo Platform is installed.
■ This can be used to use a fast storage location such as a SSD or a in-memory drive.
The Cache system has a maintenance task that deletes the expired entries.
■ It is activated by default when a Denodo administrator enables the Cache.
■ It executes the CLEAN_CACHE_DATABASE stored procedure in the background.
* Check the Labs Guide document for a complete description of the Lab DEN80EDU18S04LAB01
The Bulk Data Load feature should be enabled on the cache engine:
■ When the amount of rows to be inserted is tens of thousands or higher.
■ With a lower number of rows, there is no performance increase and sometimes, there even
may be a performance decrease.
■ HDFS based databases require Bulk Data Load to be enabled to be eligible for caching.
■ When caching huge datasets (million of rows), as it will improve the insertion
phase significatively.
The cache system is a DBMS, and its access is done via JDBC connection pool that must
be configured properly.
■ The database used as cache should not be used as data source.
■ An overuse of the cache may end up making the system slower:
■ Views with cache activated need to query the cache even if the data is not actually cached.
■ The system will be slower if the waiting times in the cache connection pool are bigger than
going to the original source.
■ Monitor the connection pool to ensure it is properly sized!