You are on page 1of 20

Decision One

On working days we had discuss lot about the tools used for Database
monitoring and where are the servers and what was the gateways used and the databases
naming conventions used and where to made concentration and where to look up for the
particular error and how to resolve that or what are the steps to escalate and whom to
mail about the issue. And he let me to know where does the password file exists and the
password for that file database.kdb. And the purpose of each databases and he made me
to identify which are the production databases and which are the development databases.
Duane updates me what are the things done on the database, for ex,
Updating patches for the DST issues and about the weekly maintenance done on the
databases and about recycling the database.
Duane had sent me many documents regarding the instructions of how to
get into the tool and how to monitor etc. He let me know what the fog light messages are
and which one to keep and which one to delete. Duane drives always and some time I
used to drive to dig around the issues by his guidance.
The following introduction was just the juice of our discussion made till now.

First of all, we need to make a VPN connection with DecisionOne VPN Server to
access the servers in Decisionone.

Following were the tasks needed to be performed in the daily basis

1. FogLight Monitoring: Need to Monitor the health of the databases [Production]


using the Foglight Operations Console [FOC].
2. Control-M Monitoring: Need to monitor the daily backup scripts scheduled with
in the Control-M.
3. BCTOP Database Status: Need to monitor the BCTOP database data load has
been finished successfully.
4. Crystal Reports: Need to send the crystal reports scheduled in the reports server.

Let us see the above tasks in detail:

1. Fog Light Monitoring

Monitoring is mainly need for Production Servers which runs Oracle


database in it. Duane had sent me a document which talks detail about Messages for
Monitoring from Fog light tool. For example, which messages need to attend and which
message need to clear from alert area.
We noticed one alert message (Checkpoint Not Complete) comes
everyday from development database d1mid01 from server dsdbfra002. We need to
resolve this error so first we plan to change the parameter. On 14th Feb we change the init
parameter LOG_CHECKPOINT_INTERVAL to high value and the database has been
restarted with the new parameter file. Alert message continues to come so we plan to add
log groups and increase the size of existing redo log files. On 21st Feb we increase redo
log files size from 1M to 20M and we add 3 redo log Groups each having 2 files with
20M to resolve the issue.
Another warning message also comes frequently which is “Log files are
not copied more than one location”, Duane said in Decision One we don’t have such a
policy to copy the log files more than one location so on 21st Feb we remove the mail
sending option from the agent Ora_Archive_Multiplex rule.
We update the file “Fog light Product Change Log” from the shared
directory about the changes made to the tool.

We need to check the health of the databases on a daily basis. For this we use
FOC.

1. Foglight Monitoring :

Foglight Install.

DBA Script - Foglight monitoring.doc

1- Review critical errors (a published list has not yet been compiled, this may be
developed as we go through the review of this document).

2- Review functionality.
3- Monitoring needed only for production databases.
4- When an issue is worked, document the resolution, send an email to
duane.wilcox@decisionone.com and david.baker@DecisionOne.com.
5- To get the description of an error, right-click (rc) on the message, click ‘show
details’.

6- To clear the error (like the one above), rc the error and select ‘clear event’.
7- To look at an agent’s rules (like oracle_FS80PR), highlight the agent, rc, select
‘edit’, click on ‘rules’.
8- This screen shows all of the rules defined.

9- Double-click on the Ora_Can_Connect rule to see how the rule is defined.


You notice the different levels of warning – Normal / Warning / Critical / Fatal.
Many of these messages have been modified for D1 specifically.
10- Go back to the oracle_FS80PR agent, double-click / actions / ASPs.
11- These are the Automated Startup Parameters for the agent, and are specific to the
agent.

12- Close that out. On main Foglight Operations Console, click ‘Tools’, then
‘Foglight Registry’.
13- Each of the areas have info to maintain, but we normally work in the ‘World’
area.

14- I showed you how to change the DBA_PAGER variable.


15- Here you can see that the ‘Production’ definition, which is under the ‘WORLD’
directory, defines each of our production unix servers (located in Auburn Hills).
What this does is allow us to override the CPUWarning, CPUFatal and
CPUCritical variables for just the production servers.
16- And, on the Development side, you see we override the values for DBA_PAGER
to an email id (an invalid email id). What this does is not put pages out to the
pager for the development servers.

2. Control-M Monitoring:

Control-M - Xsession Procedures Used to Access the Software

Hummingbird Exceed Session

1. Configure Xterm software. I use Hummingbird Exceed, that runs on my


laptop.
2. Double-click the software to startup a Session
3. DBA Script - Control-m monitor.doc

Hummingbird Xsession Control-M Logon Commands

1. on SMF001, start a unix session.


2. Login with: ecs
3. Password: xxxx
4. The DISPLAY variable must be set. If this system parm is not set, you will
get a "Cannot open display" error.

NOTE: The user profile has been changed for most Unix accounts, to
automatically set the DISPLAY variable. So for the most part, step 4 is done
for you. If the error message displays that the DISPLAY has not been set, do
the following steps:

- Find the IP address of the laptop / pc you are on. Can be obtained by
doing ipconfig at a command prompt.
- In the smf001 session, at the prompt, enter "export
DISPLAY=xxx.xxx.xxx.xxx:0.0", where the xxx's are equal to your ISP
assigned TCP/IP Address

5. At the prompt enter: root_menu


6. Enter the ECS Username : ecs
7. Enter the Password: password
8. Select Option 1 from the ENTERPRISE/CS ROOT MENU
9. The Enterprise Constrolstation menu appears. (no action on this menu).

10. when the ‘Load Net’ menu appears, click on the middle ‘LOAD’ icon to open.
11. when the ‘SHOW’ menu appears, click on the upper left hand icon to close,
this is not needed.

12. The ECS gui (Enterprise Controlstation Network View) display window can
be resized by dragging the sides of the window.
13. Here’s the size of window that focuses on the active job list (jobs loaded for
today’s schedule).
Control-M – Displaying a Job

1. Get down to the job detail level in control-m

2. Right-click on the job. On the intermediate menu, select ‘Details’. This view
shows the detail of the current job from the active job file (a scheduled job).
3. Back on the Enterprise Controlstation Network View screen, right-click and
click on ‘View Sysout List’. This screen shows the list of job output from the
scheduled job. There are multiple entries if the job has been rerun.
Double-click on this entry to review the output of this backup job.

4. To rerun a job - after review of the sysout, and correcting any errors, go back
to the ECS Display. Right-click the job and click RERUN. The job will be
re-submitted under the job owner (oracle8i in this example).
Control-M - Forcing a Job to Run

From the first ECS Menu, click Scheduling, then Scheduling Definitions
Double Click the TABLE Definition you want

Highlight the Job you want to rerun


Select Menu
Force

Then Control-m will come back with an intermediate screen…


Click “Confirm” to run the job.
Job will have the following summary shown.

Back in Enterprise Controlstation Network View Window:


Right Click on Job that was re-submitted.
Why – Error Missing Condition message is ok

Highlight this condition and click ‘Add Prerequisite Condition’.


Intermediate screen comes up, click ‘Confirm’.
Close Why screen.
Job should be running. The Yellow color indicates the job is running.

Enterprise Controlstation Network View Window:

Right Click Item – Select Free


Selected Job should turn yellow and execute

Color Indication:

Green – completed
Yellow – executing.
Red - error, need to be fixed
Grey – awaiting resource, do right-click on job, and do ‘WHY’
Blue - awaiting resource, do right-click on job, and do ‘WHY’
Resolve the WHY condition if necessary.

You might also like