Download as xlsx, pdf, or txt
Download as xlsx, pdf, or txt
You are on page 1of 28

Item Checked

Panel Yes
Pontis Up ? Yes
Full GC ? No
Communications Yes
Benefits Yes

Servers Servers are up and Running
Runtime Servers
Full GC check
Oracle DB
Runtime All RT jobs in success status
Check status of outgoing comm’s and benefit
System KPIs (verify according to business activity)
Check Watermark
Check customer feed
Oracle DB Tablespace usage
Check statistics files arrival on e-mail
UI01 machine has enough space for DCS run
Analyst Talkmobile_Migration

The job failed. Unable to determine if the owner (PONTIS\Maya.Peleg) of job Analyst Talkmobile_Migration
Outcome Notes

Last Verification Date Status

18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V CDR Source Markup
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
yes Idle Failed ### ###
yes Idle Failed ### ###
yes Idle Failed ### ###
yes Idle Failed ### ###
yes Idle Failed ### ###

Peleg) of job Analyst Talkmobile_Migration has server access (reason: Could not obtain information about Windows NT group/user 'PONTIS
Communications :
Success setup.vfukprepaid#SMSMessage
ERROR setup.vfukprepaid#SMSMessage
New setup.vfukprepaid#SMSMessage3
NewInProcess setup.vfukprepaid#SMSMessage
New setup.vfukprepaid#SMSMessage
Success setup.vfukprepaid#SMSMessage2
New setup.vfukprepaid#CampaignCacheRecord
New setup.vfukprepaid#SMSMessage2
Success setup.vfukprepaid#SMSMessage3
Success setup.vfukprepaid#BVBChangeRequestGenerationCommunication

Benefits :
Success app.business_template#OutgoingCouponBenefit
FailedDelivery setup.vfukprepaid#BillingDebitBenefit
New setup.vfukprepaid#BillingDebitBenefit
NewInProcess setup.vfukprepaid#BillingDebitBenefit
Success setup.vfukprepaid#BillingDebitBenefit
New setup.vfukprepaid#BillingTopupBenefit
Success setup.vfukprepaid#BillingTopupBenefit
FailedDelivery setup.vfukprepaid#BundleActivationBenefit
Success setup.vfukprepaid#BundleActivationBenefit
FailedDelivery setup.vfukprepaid#CommercialProductRemoveAndAddBenefit
New setup.vfukprepaid#CommercialProductRemoveAndAddBenefit
Success setup.vfukprepaid#CommercialProductRemoveAndAddBenefit
FailedDelivery setup.vfukprepaid#CommercialProductRenewAddBenefit
Success setup.vfukprepaid#CommercialProductRenewAddBenefit
FailedDelivery setup.vfukprepaid#CommercialProductRenewRemoveBenefit
Success setup.vfukprepaid#CommercialProductRenewRemoveBenefit
New setup.vfukprepaid#CommercialProductRewardBenefit
Success setup.vfukprepaid#CommercialProductRewardBenefit
FailedDelivery setup.vfukprepaid#CommercialProductRewardBenefit
New setup.vfukprepaid#FivePoundWeeklyRewardBenefit
FailedDelivery setup.vfukprepaid#FivePoundWeeklyRewardBenefit
Success setup.vfukprepaid#FivePoundWeeklyRewardBenefit
New setup.vfukprepaid#GeneralPurposeMABundleRewardBenefit
Success setup.vfukprepaid#GeneralPurposeMABundleRewardBenefit
FailedDelivery setup.vfukprepaid#GeneralPurposeMABundleRewardBenefit
FailedDelivery setup.vfukprepaid#InstantBundleRewardBenefit
New setup.vfukprepaid#InstantBundleRewardBenefit
Success setup.vfukprepaid#InstantBundleRewardBenefit
Success setup.vfukprepaid#MultipleBenefitsMABenefit1
FailedDelivery setup.vfukprepaid#MultipleBenefitsMABenefit1
New setup.vfukprepaid#MultipleBenefitsMABenefit1
New setup.vfukprepaid#MultipleBenefitsMABenefit2
Success setup.vfukprepaid#MultipleBenefitsMABenefit2
FailedDelivery setup.vfukprepaid#MultipleBenefitsMABenefit2
Success setup.vfukprepaid#MultipleBenefitsMABenefit3
New setup.vfukprepaid#MultipleBenefitsMABenefit3
New setup.vfukprepaid#MultipleBenefitsMABenefit4
Success setup.vfukprepaid#MultipleBenefitsMABenefit4
New setup.vfukprepaid#TopupAndGetMABundleRewardBenefit
FailedDelivery setup.vfukprepaid#TopupAndGetMABundleRewardBenefit
Success setup.vfukprepaid#TopupAndGetMABundleRewardBenefit

[Uncategorized (Localyes
[Uncategorized (Localyes
[Uncategorized (Localyes
[Uncategorized (Localyes
[Uncategorized (Localyes

ows NT group/user 'PONTIS\Maya.Peleg', error code 0x6ba. [SQLSTATE 42000] (Error 15404)).


yes 0
yes 0
yes 0
yes 0
yes 0
Item Checked
Panel Yes
Pontis Up ? Yes
Full GC ? No
Communications Yes
Benefits Yes

Servers Servers are up and Running
Runtime Servers
Full GC check
Oracle DB
Runtime All RT jobs in success status
Check status of outgoing comm’s and benefit
System KPIs (verify according to business activity)
Check Watermark
Outcome Notes

Last Verification Date Status

18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
Communications :
New setup.vfieprepaid#TestSMS 2778
Success setup.vfieprepaid#SMSMessageNoPolicy 6475
Success setup.vfieprepaid#BenefitExtract2MIS 25527
Success setup.vfieprepaid#CommunicationExtract2MIS 46230
New setup.vfieprepaid#CommunicationExtract2MIS 1287
Success setup.vfieprepaid#ReplySMSMessageWithPolicy 5423
New setup.vfieprepaid#SMSMessageNoPolicy 1287
Success setup.vfieprepaid#TestSMS 53341
Success setup.vfieprepaid#SMSMessageWithPolicy 34332
Success setup.vfieprepaid#ContactHistoryDataExtractor 39755

Benefits :
New setup.vfieprepaid#CreditAllDAccounts 1330
Success setup.vfieprepaid#Provision4GBenefit 24905
Success setup.vfieprepaid#CreditMainBalance 425
New setup.vfieprepaid#PSOServiceDetailsBenefit28Days 16
Failed setup.vfieprepaid#Provision4GBenefit 9
Success setup.vfieprepaid#DebitMainBalance 247
Success setup.vfieprepaid#CreditAllDAccounts 22824
Success setup.vfieprepaid#ProvisionYOLOBenefit 1149
Failed setup.vfieprepaid#DebitMainBalance 9
Success setup.vfieprepaid#TuGetMACreditAllDAccount 908
New setup.vfieprepaid#ProvisionYOLOBenefit 1
Success setup.vfieprepaid#PSOServiceDetailsBenefit28Days 11232
Failed setup.vfieprepaid#CreditAllDAccounts 2
PendingNotification setup.vfieprepaid#PSOServiceDetailsBenefit28Days 11
Success setup.vfieprepaid#CreditLimitedDAccount 12
Item Checked
Panel Yes
Pontis Up ? Yes
Full GC ? No
Communications Yes
Benefits Yes

Servers Servers are up and Running
Runtime Servers
Full GC check
Oracle DB
Runtime All RT jobs in success status
Check status of outgoing comm’s and benefit
System KPIs (verify according to business activity)
Check Watermark
Outcome Notes

Last Verification Date Status

18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
Communications :
New setup.o2ukprepaid#SMSMessageProactive 2
New setup.o2ukprepaid#SMSOutgoingMessage 126
Success setup.o2ukprepaid#BenefitSuccessOutgoingMessage 3888
Success setup.o2ukprepaid#SMSMessageProactive 3337
Failed setup.o2ukprepaid#SMSMessageProactive 97
PendingNotification setup.o2ukprepaid#SMSMessageProactive 251963
Failed setup.o2ukprepaid#SMSMessageInformative 2
New setup.o2ukprepaid#BenefitSuccessOutgoingMessage 86
Success setup.o2ukprepaid#SMSOutgoingMessage 411723
PendingNotification setup.o2ukprepaid#OptinSMSCOutgoingMessage 8
PendingNotification setup.o2ukprepaid#SMSMessageInformative 232
Success setup.o2ukprepaid#SMSMessageInformative 1852
Success setup.o2ukprepaid#OptinSMSCOutgoingMessage 107
Success setup.o2ukprepaid#UsageInformativeSMS 2

Benefits :
PendingNotification setup.o2ukprepaid#MicroBoltonBenefit 1
Success setup.o2ukprepaid#MicroBoltonBenefit 36
Success setup.o2ukprepaid#OfferActivation 1972
PendingNotification setup.o2ukprepaid#OfferActivation 6
Success setup.o2ukprepaid#VirtualMoneyOfferActivation 62
Success setup.o2ukprepaid#OfferActivationByNBD 642
FailedDelivery setup.o2ukprepaid#OfferActivationByNBD 4
FailedDelivery setup.o2ukprepaid#MicroBoltonBenefit 2
PendingNotification setup.o2ukprepaid#OfferActivationByNBD 15
PendingNotification setup.o2ukprepaid#VirtualMoneyOfferActivation 1
Success setup.o2ukprepaid#LoyaltyCredit 522
Item Checked
Panel Yes
Pontis Up ? Yes
Full GC ? No
Communications Yes
Benefits Yes

Servers Servers are up and Running
Runtime Servers
Full GC check
Oracle DB
Runtime All RT jobs in success status
Check status of outgoing comm’s and benefit
System KPIs (verify according to business activity)
Check Watermark
Outcome Notes
Delayed - finished successfully

Last Verification Date Status

18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
18/04/2018 V
Communications :

Benefits :
Component Verification
Servers Servers are up and Running:

Runtime Servers

MapR Servers


Hive DB

Oozie All Oozie jobs in success status

Check the Map/Reduce job completed successfully

MapR Dashboard All server identifiers need to be green and no alarms appear

Disk Space percentage on all nodes is up to 70%

Dashboard includes 2 topologies

Analytics calculations job should run no more than 7 hours

Cluster disk space is no more than 60%

No MapR services are in failed status

Runtime All RT jobs in success status

Runtime All RT jobs in success status

Check status of outgoing comm’s and benefit

Check customer feed

Extract files

Decisioning finished successfully

Check there is no backlog

How to check

pontisserver service is up in all RT servers: "service pontisserver status"

No unavailable servers in mapr dashboard

server is up, mysql service is up

No alarms in mapr dashboard

Check in oozie console in workflow tab that all recent jobs are in success

Check in oozie console that the job "pontis-mapreduce-wf" finished successfully

Check in mapr dashboard

Connect to mapr servers and run "df -h"

Check in mapr dashboard there are 12 nodes in /data/default-rack and 14 in /hbase

Search for the job "Analytics calculation" in job tracker, click on it and check the duration

Check in mapr dashboard under "Cluster Utilization"

Check in mapr dashboard under "services"

Go to pontis desktop to jobs screen, and filter by "Last Run" column to check there are no failed jobs
Go to pontis desktop to jobs screen, and filter by "Last Run" column to check there are no failed jobs

Check that there are no old files in the sending queue /opt/pontis/data/server/queue/outbound/work/

Make sure the snapshot was loaded, search it in: /opt/pontis/data/server/cdr/loaded/subscriber_profile/<year>/<month>/<da

Make sure that 5 new extracts were created and sent to Claro in: /opt/pontis/data/server/outgoing/extracts/data/

In mysql run the following query: select run_id,RUN_STATUS,RUN_DATE,LAST_UPDATE_DATE from dcs_run_data order by run_

Run the following command and make sure the last folder in each priority matches to the current hour (change to current date

"cd /opt/pontis/data/server/queue/cdr/processed"
"ll 100/20160114 150/20160114 200/20160114 300/20160114 600/20160114"
How to solve

if pontisserver is not up:

1. Check in pontis log if there were any errors that cause the server to stop
2. Check the reason for the error and if it was fixed
3. start pontis service by the following command: service pontisserver start
4. make sure there are no errors in pontis log during the service start
5. go to the desktop application, search for this server and make sure jobs are running after the restart with status success
6. make sure all the file systems on the server has free disk space using the command df -h
in case any of the mapr servers is down (red in the dashboard) contact infra team asap.
1. in case mysql server is down contact infra team.
2. If only the service is down, check in the logs what happened.
3. make sure all the file systems has free disk space
4. start mysql service by running: service mysql start
5. if still doesn’t start - contact infra team.
1. in case you are unable to work with hue - restart hue service from mapr dashboard
2. if still not ok - contact infra team.
1. if there is a failed job, check in oozie logs what is the reason
2. if the job succeded after the failure, follow it and make sure it is not failing again.
3. if the reason is a lock file - delete the relevant lock file. The relevant path can be found in the oozie log.
4. else, consult pontis team.

in case of alerts contact infra team

if > 70%:
1. check what can be deleted:
2. old logs
3. temp files
4. core dump files
5. in case of any doubt please consult pontis before deleting.
in case no 2 topologies contact infra team asap.
1. search the jon "Analytics calculations" in the job tracker
2. check if there were any failed map/reducers
3. check the reason for the failure of the resources
4. ask infra team to check the failures

in case > 60%

1. delete old files from /mapr/archive
2. delete old snapshots from mapr dashboard
3. make sure diskspace is decreasing
in case of failed services:
1. click on the failed service to see on which node it is running
2. connect to the node shell and check in the relevant logs what is the failure reason (/opt/mapr/logs/*)
3. contact infra in order to recover the service
if failed jobs:
1. check on which server it failed and connect to the servers shell
2. check in pontis log what is the failure reason
3. check it with grep -i to the job name
4. if you can fix, fix and make sure next run was success
5. if not - consult pontis team.
if yes:
1. in the desktop make sure the relevant outgoing jobs are working and not failing
2. make sure there are no errors on pontis log on the relevant server regarding the outgoing job type.
3. check connectivity to the relevant interface by running telnet <ip> <port>
4. check in pontis log on the relevant server that the job is working (grep -i by job name)
5. make sure that the amount of files in the queue is decreasing
if not, check why the file wasn’t available in Claro nas server, and make sure to deliver the file.
1. go to desktop --> reports and choose the relevant missing extract
2. if the last line is in status "failed", click on it and choose "release for execusion". the extract will be created.
3. if the last line is from the day before, copy it, choose the correct dates and choose "release for execusion". the extract will b
4. wait one hour and make sure the extract was created and available in the path.
if failed:
1. check in mysql "run_status" what is the reason:
2. if oozie failed, search in oozie workflow for the job "hive-query-wf" that started in the same time as the decision.
3. check in the job's log what is the failure reason.
4. if any other reason, check in the pontis log of cdr servers what are the errors related to visual rules RT job.
5. make sure the servers are not stuck, and that other jobs are running successfully in the desktop.
6. consult pontis team regarding rerun the process.
in case of backlog need to check the processing status.
1. check in icinga the "time to process queue" parameter in cdr processing service and make sure it is decreasing
2. check that "cdr processing from files" jobs are active and running on all cdr servers.
3. make sure that the servers are actually processing by running the following command on each CDR server:
tailf /opt/pontis/logs/server/sla_buckets.log | grep -i "processed files"
one line or more should appear every minute, with the processing data.
the last number is the avg milisec, the number before is number of events.
the avg milisec should be less than 400, and the number of events should be between 10000-100000 per minute (total).
4. if one or more of the servers are not processing, check in pontis log if any errors or stuck jobs.
5. check in grafana dashboard the processing rate for each server. check what are the problematic servers.
6. restart pontis server if the server is stuck by running the commands:
service pontisserver stop
when "ps -ef | grep java" command - has no output, it means that the service is down. run:
service pontisserver start

You might also like