Big Data Integration With Oracle BI

Controlled Copy
Big Data Integration with Oracle BI
Name
Prepared By
Sriram Subbarayan
Reviewed by
Approved By
Role
Signature
Date
29th Oct 2014
C3: Protected
Controlled Copy
Table of Contents
1.
Introduction
2.
What is Hadoop?
3.
How does Hadoop process data stored in multiple nodes
4.
What is Hive
5.
Why does this matter in the Oracle Business Intelligence / Analytics space
6.
How do you integrate OBIEE 11g with Hadoop?
Let see step by step process as follows:
http://support.oracle.com
C3: Protected
Controlled Copy
Big Data Integration with OBIEE 11g

platform
1. Introduction
The newest release of Oracle Business Intelligence 11.1.1.7 shows
Oracle's continued efforts in trying to integrate its Oracle Business
Intelligence Platform with big data technologies such as Hadoop and Hive.
Specifically, I'm talking about OBIEE 11g's ability to integrate with a Hadoop
Data source. Let us see how to integrate OBIEE 11g with Hadoop/Big Data
2. What is Hadoop?
Hadoop is a framework that enables data to be distributed amongst many
servers (nodes), commonly referred to as a 'distributed file system'. The
data is not stored in a single database; rather it is spread across multiple
clusters.
3. How does Hadoop process data stored in multiple

nodes
Hadoop uses a programming model called 'MapReduce' for parallel
processing across multiple nodes. At a high level this is comprised of two
steps:
C3: Protected
Controlled Copy
3.1 Map step
The map step takes the data, divides it into smaller sets of
data and distributes the result to worker nodes
3.2 Reduce step
The reduce step collects the data from all of the worker
nodes and aggregates it into a single 'output'
4. What is Hive
MapReduce functions are generally written in Java and generally require
someone with deep knowledge in both Hadoop and MapReduce. The guys
over at facebook created a technology called 'Hive' which is a data
warehouse infrastructure that sits on top of Hadoop. More simply, Hive does
the 'heavy lifting' of creating the MapReduce functions. In order to query a
Hadoop distributed file system, instead of having to write MapReduce code,
you generate sql-style code in a hive language called 'HQL'
5. Why does this matter in the Oracle Business

Intelligence / Analytics space
The analytics space is experiencing a shift in both technology and function.
Traditional BI projects required a 'data warehouse' to store data in a series
of star schemas (denormalized models) for quick query generation and data
retrieval. The development and support of the data warehouse is achieved
through a team of ETL developers whose main focus is to create the
mappings that perform the data transformation from the source to the
target.
Unless the functional requirements are clearly understood during this phase,
value is usually lost in the data transformation and the potential to eliminate
relevant data is certainly possible.
C3: Protected
Controlled Copy
Using OBIEE 11g's Hadoop integration via a Hive ODBC, OBIEE can directly
query distributed file systems via Hive. What does this mean? The potential
now exists to eliminate or reduce the need for ETL as we now have the
ability to directly query gigantic file systems.
The saving grace to ETL developers is that a need still exists for someone to
create the HQL functions that populate the 'tables' that OBIEE uses.
Ultimately, it could be a change in how ETL is developed.
6. How do you integrate OBIEE 11g with Hadoop?

LET SEE STEP BY STEP PROCESS AS FOLLOWS :
6.1 Step 1: Download the Hive ODBC Drivers from
http://support.oracle.com
You can reference Oracle Note 'Using Oracle Hadoop ODBC Driver with BI
Administration Tool [ID 1520733.1]'
C3: Protected
Controlled Copy
6.2 Step 2: Create a Hive ODBC Connection via the ODBC Data
Source Administrator
Similar to how you create an ODBC connection to edit the repository online,
you're going to create an ODBC connection but this time specify the driver
as 'Oracle Apache Hadoop Hive WP Driver'
Once you've created the ODBC Data Source Connection, you can configure
the Driver set up under the 'General' tab:
C3: Protected
Controlled Copy
6.3 Step 3: Configure Database Connection
Moving into the repository, you're going to create a new database

connection like you would for any data source in the physical layer. Note
that you need to specify the database type as 'Apache Hadoop' (this is
important!).
C3: Protected
Controlled Copy
6.4 Step 4: Create Connection Pool
Within the Apache Hadoop database connection you just created in step 3,
create a data source with a call interface as type 'ODBC 2.0' or 'ODBC 3.5'.
The data source call interface should not be 'Apache Hadoop' (you've already
specified the database as type as Apache Hadoop!). If you specify the data
source call interface as 'Apache Hadoop' you will receive the following error:
C3: Protected
Controlled Copy
Your connection pool should be similar to the following:
You should now be able to import your tables and columns just like any
other connection pool. The BI Server will generate normal SQL statements
as if it were querying a traditional Oracle database, but the Hive ODBC
driver in turn converts that to HQL which is used to execute mapreduce
functions to query the Hadoop distributed file system across multiple nodes.
C3: Protected

Big Data Integration With Oracle BI

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Integration With Oracle BI

Uploaded by

Copyright:

Available Formats

Controlled Copy

Big Data Integration with Oracle BI

29th Oct 2014

How does Hadoop process data stored in multiple nodes

How do you integrate OBIEE 11g with Hadoop?

Let see step by step process as follows:

Big Data Integration with OBIEE 11g

3. How does Hadoop process data stored in multiple

3.1 Map step

3.2 Reduce step

5. Why does this matter in the Oracle Business

6. How do you integrate OBIEE 11g with Hadoop?

6.3 Step 3: Configure Database Connection

Moving into the repository, you're going to create a new database

6.4 Step 4: Create Connection Pool

Your connection pool should be similar to the following:

You might also like