Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

Oracle Drivers configuration for High Availability

is it a developer's job?
Ludovico Caldara - Computing Engineer @CERN, Oracle ACE Director
Ludovico Caldara
■ Two decades of DBA experience (Not Only Oracle)

■ ITOUG co-founder

■ OCP (11g, 12c, MySQL) & OCE

■ Italian living in Switzerland

■ http://www.ludovicocaldara.net

■ @ludodba

■ ludovicocaldara
The Large Hadron Collider (LHC)

Largest machine in the world


27km, 6000+ superconducting magnets

Fastest racetrack on Earth


Protons circulate 11245 times/s (99.9999991% the speed of light)

Emptiest place in the solar system


High vacuum inside the magnets

Hottest spot in the galaxy


During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun 3
Large databases

SQL> select sum(bytes/power(1024,5)) as "PetaBytes"


> from dba_data_files;

PetaBytes
--------------
1.056794738695
Or complex ones
Oracle Cloud Infrastructure

New Free Tier


Always Free
Services you can use for unlimited time

oracle.com/gbtour
+
30-Day Free Trial
Free credits you can use for more services
unsplash.com/@helloquence
A new project is coming in your company
… and the development starts
Disclaimer
• Some oversimplifications
• A very complex topic
• Requires DBA and developer skills
• Assume you know some basic concepts
• High availability and failover concepts
• Connections to database
• Basic NET configurations
(SCAN, Listener, Services, TNS)
• Assume you have recent DB and client (>=12.2)
"Failure happens all the time.
It happens every day in practice.
What makes you better
is how you react to it."
― Mia Hamm
What do you have to protect?
• New network session • Established network session

Try - Wait - Failover Wait - Retry - Wait - Failover


- Replay query/transaction
Factors that influence HA
Too many!
• Network topology
• OS type and configuration
• DB version and service configuration
• Client version and type
• Application design / exception handling
Factors that influence HA
Too many!
• Network topology Our mission today
• OS type and configuration
• DB version and service configuration
• Client version and type
• Application design / exception handling
Factors that influence HA
Good white-paper:
Too many! Oracle Client Failover - Under the Hood
By Robert Bialek (Trivadis)
• Network topology
• OS type and configuration
• DB version and service configuration
• Client version and type
• Application design / exception handling
A concept that you must know
Database Services
• Virtual name for a database endpoint

Registered with
the listener

HR_SVC HR_SVC

CRM_SVC REP_SVC

Real Applications Cluster / Data Guard


Database Services
• Active-Active (RAC, Golden Gate)

HR_SVC HR_SVC

Real Applications Cluster / Data Guard


Database Services
• Active-Passive (RAC, Data Guard, RAC ON)

REP_SVC

Real Applications Cluster / Data Guard


Database Services
• The DBA can create services with:
• srvctl add service
• dbms_service.create_service() PL/SQL procedure.

• Both methods have parameters for HA


• Hint: HA at service level is superfluous if the client is not
configured properly

• Did you know? Parameter service_names is deprecated!


Oracle recommends against
using default services
(DB_NAME or PDB_NAME) or SID
Recommended descriptor (client >=12.2)
HR = (DESCRIPTION =
(CONNECT_TIMEOUT=120)(RETRY_COUNT=20)
(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST =
(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=primary-scan)(PORT=1521)))
(ADDRESS_LIST =
(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=standby-scan)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME = HR.cern.ch)))
Planned Maintenance
Planned Maintenance
• CRM sessions exist on instance 1

CRM_SVC

Real Applications Cluster / Data Guard


Planned Maintenance
• Need to restart instance 1

CRM_SVC

Real Applications Cluster / Data Guard


Planned Maintenance
• Service relocation: new sessions go to instance 2

CRM_SVC

Real Applications Cluster / Data Guard


Planned Maintenance
• Service relocation: new sessions go to instance 2
• Problem: what about existing sessions?

CRM_SVC

Real Applications Cluster / Data Guard


Planned Maintenance
• Service relocation: new sessions go to instance 2
• Problem: what about existing sessions?

CRM_SVC

Real Applications Cluster / Data Guard


How to drain sessions
• You need to know that the service is being relocated
• Use Fast Application Notification (FAN)!

ONS

CRM_SVC

Real Applications Cluster / Data Guard


How to drain sessions
• You need to know that the service is being relocated
• Use Fast Application Notification (FAN)!

register

ONS
connect
CRM_SVC

Real Applications Cluster / Data Guard


How to drain sessions
• You need to know that the service is being relocated
• Use Fast Application Notification (FAN)!

notification!

ONS

CRM_SVC stop start CRM_SVC

Real Applications Cluster / Data Guard


How to drain sessions
• You need to know that the service is being relocated
• Use Fast Application Notification (FAN)!

disconnect when the transaction


is over and reconnect

ONS ONS

CRM_SVC CRM_SVC

Real Applications Cluster / Data Guard


FAN at database side
• Grid Infrastructure is necessary to register with ONS
• ONS must be enabled (default remote port 6200)
• 18c: in-band notifications
• FAN/enabled Service
srvctl add service –db orcl –service hr_svc
-rlbgoal [SERVICE_TIME | THROUGHPUT] # for load balancing advisory
-notification TRUE # for OCI/ODP.net connections

srvctl relocate service –db orcl –service hr_svc


-oldinst orcl1 -newinst orcl2
-drain_timeout 10 # let some time for sessions to drain
# switch –force not specified, sessions are not killed
FAN at client side
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.ServiceDownEvent;
[...]
FanManager fanMngr = FanManager.getInstance();
onsProps.setProperty("onsNodes", “node1:6200,node2:6200");
fanMngr.configure(onsProps);
FanSubscription sub = fanMngr.subscribe(props);
sub.addListener(new FanEventListener() {
public void handleEvent(ServiceDownEvent event) {
System.out.println("Service down event");
System.out.println(event.getReason());
// handle the event
}
});
FAN at client side
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.ServiceDownEvent;
[...]
FanManager fanMngr = FanManager.getInstance();
onsProps.setProperty("onsNodes", “node1:6200,node2:6200");
fanMngr.configure(onsProps);
FanSubscription sub = fanMngr.subscribe(props);
sub.addListener(new FanEventListener() {
public void handleEvent(ServiceDownEvent event) {
System.out.println("Service down event");
System.out.println(event.getReason());
// handle the event
}
});
Fast Connection Failover (FCF)
• Pre-configured FAN integration
• Works with connection pools
• The application must be pool aware
• (borrow/release)
• The connection pool leverages FAN events to:
• Remove quickly dead connections on a DOWN event
• (opt.) Redistribute the load on a UP event
Fast Connection Failover (FCF)
• UCP (Universal Connection Pool, ucp.jar) and WebLogic Active
GridLink handle FAN out of the box.
No code changes! Just enable FastConnectionFailoverEnabled.

• Third-party connection pools can implement FCF


• If JDBC driver version >= 12.2
• simplefan.jar and ons.jar in CLASSPATH
• Connection validation options are set in pool properties
• Connection pool can plug javax.sql.ConnectionPoolDataSource
• Connection pool checks connections at borrow/release
Fast Connection Failover (FCF)
• UCP (Universal Connection Pool, ucp.jar) and WebLogic Active
GridLink handle FAN out of the box.
No code changes! Just enable FastConnectionFailoverEnabled.

• Third-party connection pools can implement FCF


• If JDBC driver version >= 12.2
• simplefan.jar and ons.jar in CLASSPATH
• Connection validation options are set in pool properties
• Connection pool can plug javax.sql.ConnectionPoolDataSource
• Connection pool checks connections at borrow/release
Fast Connection Failover (FCF)

• OCI Connection Pool handles FAN events as well


• Need to configure oraaccess.xml properly in TNS_ADMIN
• Python’s cx_oracle, PHP oci8, etc. have native options

• ODP.Net: just set "HA events = true;pooling=true"


Session Draining in 18c
• Database invalidates connection at:
• Standard connection tests for connection validity
(conn.isValid(), CheckConStatus, OCI_ATTR_SERVER_STATUS)
• Custom SQL tests for validity (DBA_CONNECTION_TESTS)
• SELECT 1 FROM DUAL
• SELECT COUNT(*) FROM DUAL
• SELECT 1
• BEGIN NULL;END
• Add new:
execute dbms_app_cont_admin.add_sql_connection_test(
'select * from dual', service_name);
“Have we implemented FAN/FCF correctly?”

• TEST, TEST, TEST

• Relocate services as part of your CI/CD

• Application ready for planned maintenance


=> happy DBA, Dev, DevOps
Why draining?
• Draining best solution for hiding planned maintenance

No draining

Killing persisting sessions

Unplanned from application perspective


unsplash.com/@darmfield
Unplanned Maintenance
Unplanned Maintenance (failover)
• CRM sessions exist on instance 1

CRM_SVC

Real Applications Cluster / Data Guard


Unplanned Maintenance (failover)
• CRM sessions exist on instance 1
• The instance crashes. What about running sessions/transactions?

CRM_SVC

Real Applications Cluster / Data Guard


Unplanned Maintenance (failover)
• CRM sessions exist on instance 1
• The instance crashes. What about running sessions/transactions?
• (Any maintenance that terminate sessions non-transactional)

CRM_SVC

Real Applications Cluster / Data Guard


Transparent Application Failover (TAF)
• For OCI drivers only
• Automates reconnect
• Allows resumable queries (session state restored in 12.2)
• Transactions and PL/SQL calls not resumed (rollback)
Transparent Application Failover (TAF)
• For OCI drivers only
• Automates reconnect
• Allows resumable queries (session state restored in 12.2)
• Transactions and PL/SQL calls not resumed (rollback)

Oracle Net
Fetched
Transparent Application Failover (TAF)
• For OCI drivers only
• Automates reconnect
• Allows resumable queries (session state restored in 12.2)
• Transactions and PL/SQL calls not resumed (rollback)

Oracle Net
Fetched

Lost
Transparent Application Failover (TAF)
• For OCI drivers only
• Automates reconnect
• Allows resumable queries (session state restored in 12.2)
• Transactions and PL/SQL calls not resumed (rollback)

Oracle Net
Fetched Discarded

Lost
Transparent Application Failover (TAF)
• For OCI drivers only
• Automates reconnect
• Allows resumable queries (session state restored in 12.2)
• Transactions and PL/SQL calls not resumed (rollback)

Oracle Net
Fetched Discarded

Lost Fetched
Transparent Application Failover (TAF)
Server side:
srvctl add service –db orcl –service hr_svc
-failovertype SELECT -failoverdelay 1 -failoverretry 180
-failover_restore LEVEL1 # restores session state (>=12.2)
-notification TRUE

Client side:
HR = (DESCRIPTION =
(FAILOVER=ON) (LOAD_BALANCE=OFF)
(ADDRESS=(PROTOCOL=TCP)(HOST=server1)(PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME = HR.cern.ch)
(FAILOVER_MODE =
(TYPE = SESSION)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 1)
)))
Fast Connection Failover and FAN
• Like for planned maintenance, but…

• Connection pool recycles dead connections

• Application must handle all the exceptions

• FAN avoids TCP timeouts!


Application Continuity (AC)
• Server-side Transaction Guard (included in EE)
• Transaction state is recorded upon request
• Client-side Replay Driver
• Keeps journal of transactions
• Replays transactions upon reconnect

• JDBC thin 12.1, OCI 12.2


Application Continuity (AC)
• AC with UCP: no code change
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl");
...
conn = pds.getConnection(); // Implicit database request begin
// calls protected by Application Continuity
conn.close(); // Implicit database request end

• AC without connection pool: code change


OracleDataSourceImpl ods = new OracleDataSourceImpl();
conn = ods.getConnection();
...
((ReplayableConnection)conn).beginRequest(); // Explicit database request begin
// calls protected by Application Continuity
((ReplayableConnection)conn).endRequest(); // Explicit database request end
Application Continuity (AC)
Service definition:
srvctl add service –db orcl –service hr
-failovertype TRANSACTION # enable Application Continuity
-commit_outcome TRUE # enable Transaction Guard
-failover_restore LEVEL1 # restore session state before replay
-retention 86400 # commit outcome retained 1 day
-replay_init_time 900 # replay not be initiated after 900 seconds
-notification true

Special configuration to retain mutable values at replay:


GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>;
GRANT KEEP DATE TIME TO <USER>;
GRANT KEEP SYSGUID TO <USER>;
Transparent Application Continuity (TAC)
• “New” in 18c for JDBC thin, 19c for OCI
• Records session and transaction state server-side
• No application change
• Replayable transactions are replayed
• Non-replayable transactions raise exception
• Good driver coverage but check the doc!
• Side effects are never replayed
Transparent Application Continuity (TAC)
Service definition:
srvctl add service –db orcl –service hr
-failover_restore AUTO # enable Transparent Application Continuity
-failovertype AUTO # enable Transparent Application Continuity
-commit_outcome TRUE # enable Transaction Guard
-retention 86400 # commit outcome retained 1 day
-replay_init_time 900 # replay not be initiated after 900 seconds
-notification true

Special configuration to retain mutable values at replay:


GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>;
GRANT KEEP DATE TIME TO <USER>;
GRANT KEEP SYSGUID TO <USER>;
Still not clear?
• Fast Application Notification to drain sessions

• Application Continuity for full control


(code change)

• Transparent Application Continuity for good HA


(no code change)
Connection Manager in Traffic Director Mode
CMAN with an Oracle Client “brain”
Classic vs TDM

CLIENT CLIENT
CMAN is the
end point of
SQLNet is client
redirected connections
transparently
cman cman

CMAN opens
its own
connection to
the DB
DB DB
Session Failover with TDM
• Client connects to cman:1521/pdb1
CLIENT

cman

PDB1
CDBA CDBA
Session Failover with TDM
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
CLIENT

cman

PDB1
CDBA CDBA
Session Failover with TDM
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
CLIENT
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
cman

PDB1
CDBA CDBA
Session Failover with TDM
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
CLIENT
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
cman
• The next request is executed on the
surviving instance

PDB1
CDBA CDBA
Session Failover with TDM
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
CLIENT
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
cman
• The next request is executed on the
surviving instance
• The connection client-cman is intact, the
client does not experience a
disconnection
PDB1
CDBA CDBA
Magic does not happen, you need to plan
Thank you!
Ludovico Caldara - Computing Engineer @CERN, Oracle ACE Director

You might also like