Infobright Best Practices

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Best Practices and Optimization with Infobright John Ringhofer Sales Engineer john.ringhofer@infobright.

com

Agenda

Infobright Architecture Review Installation Tips Leveraging the Architecture Toolset Integration Getting Started Down the Right Path Q&A

Infobright Architecture Review

Infobright Technology: Key Concepts

1. 2. 3. 4.

Column orientation Data packs and Compression Knowledge Grid Granular Engine

1. Column Orientation
IncomingData
EMP_ID 1 2 3 FNAME Moe Curly Larry LNAME Howard Joe Fine SALARY 10000 12000 9000

ColumnOrientedLayout
(1,2,3;Moe,Curly,Larry;Howard,Joe,Fine;10000,12000,9000;)

Workswellwithaggregateresults(sum,count,avg.) Onlycolumnsthatarerelevantneedtobetouched Consistentperformancewithanydatabasedesign Allowsforveryefficientcompression

2. Data Packs and Compression


DataPacks

64K 64K 64K 64K

Eachdatapackcontains65,536datavalues Compressionisappliedtoeachindividualdatapack Thecompressionalgorithmvariesdependingondata

typeanddistribution

Compression
Resultsvarydependingonthe

PatentPending Compression Algorithms

distributionofdataamongdatapacks Atypicaloverallcompressionratio seeninthefieldis10:1 Somecustomershaveseenresultsof 40:1andhigher Forexample,1TBofrawdata compressed10to1wouldonlyrequire 100GBofdiskcapacity

3. The Knowledge Grid


KnowledgeGrid
appliestothewholetable Informationabout thedata
ColumnA Column A DP1

KnowledgeNodes
builtforeachDataPack

ColumnB

Globalknowledge Stringandcharacterdata Numericdata Distributions Builtduring LOAD

DP2 DP3 DP4 DP5 DP6

Dynamicknowledge
KnowledgeNodesanswerthequerydirectly,or IdentifyonlyrequiredDataPacks,minimizingdecompression,and Predictrequireddatainadvancebasedonworkload

Builtperquery E.g.for aggregates,joins

4. Granular Engine
Infobright Database RoughSet GranularEngine Report

1%

CompressedData

Leverage DomainExpert
DomainExpert: Breakthrough Analytics Enables Infobright and users to add intelligence into Knowledge Grid directly with no schema changes Optimized for web data analysis
IP addresses Email addresses URL/URI

DomainExpert
Intelligence to automatically optimize the database

Can cut query time in half when using this data Improves compression

Leverage DomainExpert
Pattern recognition in data enables faster query performance
Patterns defined and stored Complex fields decomposed into more homogeneous parts Database uses this information when processing query

IB 4.0 delivered with pre-defined data types common to machine-generated data


URL E-Mail addresses IP Addresses

Users can also easily add their own data patterns


Identify strings, numerics, or constants Financial Trading example ticker feed AAPL350,354,347,349 encoded %s-%d,%d,%d,%d

http://www.infobright.com/News-&-Events/Events/

Infobright Architected on MySQL


The worlds most popular open source database

Installation Tips

Installation Tips
Install Directory: Dont install to Linux /home or Windows /program files
IEE Evaluation Key (Linux or Windows): Put .lic file in the /path/to/infobright directory. (Eval only) IEE Evaluation Key Windows: Put the .license file in the installation directory. (Eval only) IEE Evaluation: After starting Infobright for the first time, backup the data/iblicense.dat file.

Note: The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, lower the memory settings for Infobright.

Leveraging the Architecture

Leverage Column Orientation

DO Create tables with lots of columns Only use the columns you need to complete a specific query Avoid Select * from wide tables Using views with many columns

Leverage Data Packs Bulk Loads


Expect transactional insert to be slow Uses MySQL API When using JDBC, refer to: http://www.infobright.org/Blog/Entry/using_ the_mysql_jdbc_driver_with_infobright_for _data_loading/ Overhead Decompression & Compression Infobright Bulk Loader Optimized for Infobright Writes data packs, not rows Up to 150GB per hour Distributed Load Processor Optimized for Infobright Multiple servers creating data packs Has loaded 10TB per hour into a SINGLE table

Leverage Compression Backups/Restore


Compressed data = faster backup Avoid MySQLDump Use compressed database files Backup Procedure Entire Infobright Directory Full Backup (e.g. Weekly) Regular Incremental Backups (e.g. Daily) Restore Procedure Copy backup image to Infobright directory (e.g. data image to data directory)

Leverage Fault Tolerance High Availability

Block level replication Active/Passive Moves compressed data image DRBD Use DLP, Infobright loader, or MySQL loader Use IEE or ICE

Leverage Fault Tolerance High Availability


Use DLP to Load the Same Data to Multiple Locations Simultaneously (IEE only) Active/Active Very fast data loading Can be scaled out to support as many servers as needed Servers are highly-available to process queries since the data packs are created remotely No time lag as with asynchronous replication; data is immediately available to be queried.

Leverage the Knowledge Grid


EveryonewantstobeaStar Do constrain the fact table directly Do use sub-selects instead of joins Do add additional columns to create useful knowledge nodes Do remove references to indexes and other constraints (PK, FK) Do remove aggregate, reporting and summary tables per use case.

Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to speed up your queries.

Leverage the Knowledge Grid Query Example


Original SQL
select sum(dlr_trans_amt), a.msa_id from fact_sales a, dim_dates b, dim_msa c where a.trans_date=b.trans_date and a.msa_id=c.msa_id and b.trans_year=2006 and b.trans_month='MARCH' and c.msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA') group by a.msa_id; 3 rows in set (3 min 11.65 sec)

Becomes
select sum(dlr_trans_amt), msa_id from fact_sales a where trans_date in (select trans_date from dim_dates b where b.trans_year=2006 and b.trans_month='MARCH') and msa_id in (select msa_id from dim_msa where msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA') group by msa_id; 3 rows in set (21.28 sec)

Leverage the Knowledge Grid - DML


Delete, Truncate and Update Operations Query performance can suffer Updates mimics Delete/Insert Drop/Create Alternative Speed and Simplicity Works in ICE as well as IEE Knowledge Nodes permanently deleted Infobright Reorg (defragment) Reload data Knowledge Grid refreshes http://www.infobright.org/Downloads/ ContributedSoftware/

Leverage the Granular Engine Optimizing Queries


Union All Faster than just using Union Leveraging Columnar Architecture Only select necessary columns (avoid select *)
Original: SELECT t1.a, sum(t2.b) FROM t1 JOIN t2 ON t1.key=t2.key WHERE t1.x > 0 AND t2.y = 5 GROUP BY t1.a; Modified: SELECT t1copy.a, sum(temp_tab.sum2) FROM ( SELECT t2.key AS k2, sum(t2.b) AS sum2 FROM t1 JOIN t2 ON t1.key=t2.key WHERE t1.x > 0 AND t2.y = 5 GROUP BY t2.key ) temp_tab, t1 t1copy WHERE temp_tab.k2 = t1copy.key GROUP BY t1copy.a;

Leverage the Granular Engine Data Types

Integers perform best Join columns Surrogate keys

Character best practice Sub-selects with surrogate keys Column option lookup http://www.infobright.org/wiki/How_ and_When_to_use_Lookups/ Chksum columns on large strings Binary collations

Leverage the Granular Engine Table Definitions


Original DDL
Create Table Customer( Customer_Key varchar(10), Customer_Name varchar(50), Customer_Address varchar(300), Category varchar(10));

Becomes
Create Table Customer( Customer_Key integer, Customer_Name varchar(50), Customer_Address varchar(300), Category varchar(10) comment lookup, Customer_Name_MD5 bigint, Customer_Address_MD5 bigint);

Original Query SELECT ... FROM table WHERE str=value. Becomes SELECT ... FROM table WHERE str=value AND cksum=cksum(str)

Toolset Integration

Bear in Mind
The unique attributes of Infobright are transparent to developers. The benefits are obvious and immediate to users. Infobright is a relational database Infobright observes and obeys SQL standards Infobright observes and obeys standards-based connectivity
Design tools Development tools Administrative tools Query and reporting tools

Infobright Development
When developing applications, you can use: Industry standard interfaces including those listed below; Comprehensive Management Services and Utilities; Robust connectivity with BI Tools.
Connector/ODBC Connector/NET Connector/J Connector/MXJ Connector/C++ Connector/C C API PHP API Perl API C++ API Python API Ruby APIs

Note: API calls are restricted to the functional support of the Brighthouse engine. (e.g. mysql_stmt_insert_id )

Popular Database Tools


MySQL Workbench Toad for MySQL on Windows Navicat for MySQL on Mac PHPMyAdmin

Points to Remember
Default port =5029 No Explain Plan

Infobright Technology Partners

BI tools
MicroStrategy Jaspersoft Pentaho BIRT

ETL Tools
Talend (aka Jasper ETL) Pentaho Data Integration

Getting Started Down the Right Path

Plan your Evaluation

Before starting your evaluation, define a concise set of target objectives and requirements Remember that Infobright shows value with medium to large data sizes (>100GB and growing on up)
Dont undersize the evaluation

A planning document is available that can help with the evaluation exercise

Migration Tools ICE Breakers

The community has contributed code called ICE Breakers that can make data migration easier Industry tools can be used but will require manual intervention

Additional Resources

Both infobright.com and infobright.org have additional documentation and white papers Ask a question on the Infobright forum If you are new to MySQL you can also visit www.mysql.com for additional help You can also visit the Infobright YouTube channel

Integrated VMs with Partner Solutions

Infobright Community Edition with Pentaho: http://www.infobright.org/Downloads/Pentaho_ICE_VM/ Infobright Community Edition with Talend: http://www.infobright.org/Downloads/Talend_VM/ Infobright Community Edition with Jaspersoft: http://www.infobright.org/Downloads/Jaspersoft_ICE_VM/ Infobright Community Edition with Jaspersoft AND Talend: http://www.infobright.org/Downloads/Talend_VM/

Questions?
For the open community
ICE Quick Start (http://www.infobright.org/wiki) ICE FAQ (http://www.infobright.org/Resources/FAQ/) ICE Data Loading Guide (http://http://www.infobright.org/wiki/Data_Loading/ MySQL Online Tutorial (http://dev.mysql.com/doc/refman/5.1/en/tutorial.html)

080

For Licensed or Evaluation Customers


IEE Quick Start and Knowledgebase (http://www.infobright.com/Wiki/IEE_Wiki) Screencasts (http://support.infobright.com/Training-Screencasts/)

Please check out our forums at (http://www.infobright.org/Forums)

You might also like