Professional Documents
Culture Documents
ADP 2 1 - Event Broker 2.0 Support Training
ADP 2 1 - Event Broker 2.0 Support Training
0
Support & Troubleshooting
2
Application Components: Event Broker Stand Alone
ArcSight
Management Center
Producers Consumers
Topic setup and Consumers must be able
route management to process the same CEF
version as specified by
the connector (0.1/10)
CEF
Hadoop HDFS
CEF 0.1/1.0
Routing OK CEF
User defined
consumer
ArcSight
SmartConnector ArcSight ESM
3
Application Components: Event Broker + Investigate
Data consumers
Avro
CEF
ArcSight
ArcSight Investigate
SmartConnectors
Event Database
Event
transform routing
Binary
ArcSight ESM
(optional)
4
Deployment Architecture: Event Broker Stand Alone
K8 worker
node
Event
Broker
K8 worker
node
Event
Broker
ArcSight K8
master node
Event
Broker
CEF
ArcMC
Connectors
5
Deployment Architecture: Event Broker + Investigate
K8 worker
node ArcSight Investigate
Connectors Database
Event
Broker
CEF
Avro
K8 worker
node
Event
Broker
K8 worker ArcSight K8
node master node
ArcSight
Installer
SMTP
ArcMC
6
System Topology
7
ADP ArcMC
Manage, Monitor, Admin
Investigate
Search
Other
Applications
Event Broker
EB Streaming Platform
Kafka
Kafka
Schema Registry
ArcSight
Connectors ArcSight
Logger
ESM
Other CEF AVRO
Event
Sources Other
Kafka
Event Kafka
Transform Kafka
EventKafka
Routing Consumers
Stream Process Stream Process
FIPS IPv6 TLS
EB Stream Processing Streams
Event Consumers
10/15/2016
github.hpe.com/hercules
EB K8S Pods
NAME READY STATUS RESTARTS AGE IP NODE
default-http-backend-w5uv6 1/1 Running 0 14d 172.77.38.6 15.214.129.100
eb-c2av-processor-927505239-xc1ol 1/1 Running 0 14d 172.77.79.4 15.214.129.102
eb-kafka-0 1/1 Running 0 14d 172.77.79.3 15.214.129.102
eb-kafka-1 1/1 Running 0 14d 172.77.66.4 15.214.129.103
eb-kafka-2 1/1 Running 0 14d 172.77.86.6 15.214.129.101
eb-kafka-manager-1775413351-4u4o3 1/1 Running 0 14d 172.77.28.3 15.214.129.103
eb-routing-processor-546396016-6g1vc 1/1 Running 0 14d 172.77.86.4 15.214.129.101
eb-schemaregistry-2895860841-hc9vo 1/1 Running 0 14d 172.77.86.3 15.214.129.101
eb-web-service-2621833535-6hji3 2/2 Running 0 14d 172.77.38.10 15.214.129.100
eb-zookeeper-0 1/1 Running 0 14d 172.77.66.3 15.214.129.103
eb-zookeeper-1 1/1 Running 0 14d 172.77.86.5 15.214.129.101
eb-zookeeper-2 1/1 Running 0 14d 172.77.79.5 15.214.129.102
hercules-management-1187739270-qfo13 2/2 Running 0 14d 172.77.38.11 15.214.129.100
hercules-rethinkdb-0 1/1 Running 0 14d 172.77.38.9 15.214.129.100
hercules-search-4175371486-rjgx6 3/3 Running 0 14d 172.77.38.14 15.214.129.100
nginx-ingress-controller-rk7o2 1/1 Running 0 14d 172.77.38.8 15.214.129.100
• EB pods for Kafka and ZooKeeper are bound to worker nodes using labels
• Can either share the same worker node or can use separate nodes
• Other EB pods are not bound to a specific worker node
• K8S will schedule them on one of the available worker nodes
# ssh 15.214.129.102
12
EB Deployment Topology
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
Node Pod
Container process
EB + Investigate Deployment Topology
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
kubernetes-vault-renew
atlas_sp atlas_sp atlas_kafka_manager
hercules-management
eb-routing-processor eb-c2av-processor eb-kafka-processor
search
search-engine atlas_schema-registry
eb-schema-registry
kubernetes-vault-renew
hercules-search
rethinkdb
hercules-rethinkdb-0
Node Pod
arcsight-installer
Container process
EB Deployment Dependency 1/5
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
arcsight-installer
Node Pod
Container process
EB Deployment Dependency 2/5
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
arcsight-installer
Node Pod
Container process
EB Deployment Dependency 3/5
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
arcsight-installer
Node Pod
Container process
EB Deployment Dependency 4/5
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
arcsight-installer
Node Pod
Container process
EB Deployment Dependency 5/5
K8S Master Node K8S Worker Node K8S Worker Node K8S Worker Node
Node Pod
Container process
Kubernetes Cluster
Vertica Cluster
Vertica Cluster
Vertica Cluster
Investigate
Event Broker
Kubernetes Master
Vertica Cluster
Vertica
Logger
39000 Logger
atlas_kafka_manager
ESM
ESM
eb-kafka-processor
9093 atlas_kafka 9093 atlas_kafka 9093 atlas_kafka
8888
arcsight-installer eb-kafka-2 eb-kafka-0 eb-kafka-1 9092
Vertica
8081
atlas_schema-registry atlas_sp atlas_sp
PORT
Other
Consumers Zookeeper Zookeeper Zookeeper
Producers
Zookeeper Cluster
Kafka
Manager cAdvisor
(localhost)
Event Broker
Event Broker Component
TLS + FIPS Enabled
External Component
SSH Tunnel & FIPs
No TLS or FIPs
EB Data Storage Topology
eb-c2av-processor
/opt/arcsight/k8s-hostpath-volume/eb/kafka
/opt/arcsight/installer/db /opt/arcsight/k8s-hostpath-volume/eb/zookeeper
Node Pod
• EB Web Service and Schema Registry persists data on a Kafka Topic
• Stream processors (c2av, routing) do not persist any data
Container process
EB Event Data Flow Topology
K8S Worker Node K8S Worker Node K8S Worker Node
K8S Master Node
Event Data
Consumers
Event Data Logger
Producers Logger
atlas_kafka atlas_kafka atlas_kafka
ESM
ESM
eb-kafka-2 eb-kafka-0 eb-kafka-1
Connectors
Connectors
atlas_web-service Vertica
apply event transforms apply event routing
kubernetes-vault-renew
eb-web-service
atlas_sp atlas_sp
atlas_schema-registry
atlas_kafka_manager eb-c2av-processor eb-routing-processor eb-schema-registry
eb-kafka-processor
Node Pod
Container process
Event Broker
Installation
28
Before Setting Up Event Broker Systems
29
Event Broker Installation
There are different files to download, depending on the deployment environment.
Deployment
Files to download Purpose
Environment
Installs the ArcSight Installer for EB stand alone.
The ArcSight Installer is a web application used to configure and deploy Event Broker to the
EB Stand Alone (ADP) • arcsight-installer-1.0.0-14.rc_eb.x86_64.rpm environment.
Images are retrieved from a remote DockerHub.
Customers must have credentials to successfully retrieved images during deployment.
Installs the ArcSight Installer for EB and Investigate.
The ArcSight Installer is a web application used to configure and deploy Event Broker and
EB + Investigate
(with Internet access) • arcsight-installer-1.0.0-14.rc.x86_64.rpm Investigate to the environment.
Images are retrieved from a remote DockerHub.
Customers must have credentials to successfully retrieved images during deployment.
• arcsight-investigate-vertica-scripts.<key>.tar.gz Installs Vertica, the Kafka Scheduler, and configures the environment.
• Vertica License (obtained independently)
Installs the ArcSight Installer for EB and Investigate.
Offline installation for EB and
The ArcSight Installer is a web application used to configure and deploy Event Broker and
Investigate • arcsight-installer-1.0.0-14.rc.x86_64.rpm Investigate to the environment.
(without Internet access)
Environments with NO internet access. Images are retrieved locally after downloading.
• arcsight_eb_images_<key>.tar Contains the Event Broker Images.
• arcsight_investigate_images_<key>.tar Contains the Investigate Images.
• arcsight-investigate-vertica-scripts.<key>.tar.gz
Installs Vertica, the Kafka Scheduler, and configures the environment.
• Vertica License (obtained independently)
Pre-requisites – Event Broker + Investigate Systems
31
Installer Properties File
Master Node Location: /opt/arcsight/installer.properties
## Event Broker Kafka will use TLS Client Authentication to verify client connections
predeploy.eb.init.client-auth=false
## Kafka log retention size for the Vertica avro topic. This is uncompressed and requires more space
to hold events for the same duration.
predeploy.eb.init.kafkaRetentionBytesForVertica=10737418240
32
Installer Properties File (continued)
Master Node Location: /opt/arcsight/installer.properties
## The message format version the broker will use to append messages to the logs.
predeploy.log.message.format.version=0.10.1.0
33
Installer Properties File (continued)
Master Node Location: /opt/arcsight/installer.properties
## ArcMC hostname
predeploy.eb.arcmc.hosts=localhost:443
## The endpoint identification algorithm to validate the server hostname using the server certificate.
predeploy.ssl.endpoint.identification.algorithm=https
34
Installer Properties File (continued)
Master Node Location: /opt/arcsight/installer.properties
#ArcSight Investigate
investigateTag=1.00.0
search.image.tag=${investigateTag}
search.engine.image.tag=${investigateTag}
management.image.tag=${investigateTag}
rethinkdb.image.tag=${investigateTag}
35
Adding Producers and Consumer
– Connectors
– Import Kafka Certificate into Keystore
– Add Kafka as a destination:
– “eb-cef” topic for sending cef data (Logger, Investigate)
– “eb-esm” topic for sending event data (ESM)
– Loggers
– Sign Kafka Consumer Certificate on Event Broker
– Connect to default “eb-cef” topic on Event Broker
– Investigate
– Connect Vertica scheduler to Kafka topic
36
Verify that Event Broker
Cluster is Healthy
37
Container Dependency Order
After deploying Event Broker, pods are configured to start in the following order. Downstream pods will not
start until the dependencies are met.
– A quorum of zookeeper pods in the cluster must be up (2 of 3, or 3 of 5). Total number of zookeepers must be odd.
– All Kafka pods must be up
– Schema Registry pod must be up
– Bootstrap Web Service, Kafka Manager
– Transformation Stream Processor (C2AV), Routing Stream Processor
38
Pod Status: A Healthy Cluster
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
default-http-backend-yjcwc 1/1 Running 0 17h 172.77.40.6 15.214.137.102
eb-c2av-processor-967417906-vzu2k 1/1 Running 0 16h 172.77.16.5 15.214.137.112
eb-kafka-0 1/1 Running 1 16h 172.77.16.3 15.214.137.112
eb-kafka-1 1/1 Running 1 16h 172.77.59.6 15.214.137.113
eb-kafka-2 1/1 Running 0 16h 172.77.40.11 15.214.137.102
eb-kafka-manager-3416293552-otw4w 1/1 Running 0 16h 172.77.16.4 15.214.137.112
eb-routing-processor-965434368-m5bme 1/1 Running 0 16h 172.77.59.5 15.214.137.113
eb-schemaregistry-2463124937-0r27o 1/1 Running 1 16h 172.77.59.4 15.214.137.113
eb-web-service-3440844888-mdwnb 2/2 Running 0 4h 172.77.40.12 15.214.137.102
eb-zookeeper-0 1/1 Running 0 16h 172.77.59.3 15.214.137.113
eb-zookeeper-1 1/1 Running 0 16h 172.77.16.6 15.214.137.112
eb-zookeeper-2 1/1 Running 0 16h 172.77.40.10 15.214.137.102
nginx-ingress-controller-we1fi 1/1 Running 0 17h 172.77.40.8 15.214.137.102
39
Pod Status: An Unhealthy Cluster
# kubectl get pods
NAME READY STATUS RESTARTS AGE
41
Check Kafka Scheduler on Vertica
# ./install-vertica/kafka_scheduler status
43
Verify the topic partition count and replication count
Why it is important:
– Check that the configured partition count matches what you expect it to be.
– Check the replication count and partition count for the topic using Event Broker Manager (Kafka Manager)
or using kafka-topics command line
# kubectl exec eb-zookeeper-0 -- kafka-topics --zookeeper localhost:2181
--describe --topic eb-cef
Topic:eb-cef PartitionCount:5 ReplicationFactor:2 Configs:
Topic: eb-cef Partition: 0 Leader: 1002 Replicas: 1002,1003 Isr: 1002,1003
Topic: eb-cef Partition: 1 Leader: 1003 Replicas: 1003,1001 Isr: 1003,1001
Topic: eb-cef Partition: 2 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
Topic: eb-cef Partition: 3 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: eb-cef Partition: 4 Leader: 1003 Replicas: 1003,1002 Isr: 1003,1002
44
Software Logs and Data
45
Licensing
46
Troubleshooting
Installation
47
Event Broker pods show multiple restarts
This is normal.
Pods will restart as they attempt to synchronize with other pods.
Restarts should cease shortly after all pods in the EB cluster have deployed on all servers.
The number of restarts should be sparse, and be less than 10 in most cases.
The number one factor that can affect the number of restarts is the connection speed in which servers can
connect and download containers.
48
Some of pods are not starting with status ErrImagePull
Problem: This indicates that the image cannot be downloaded from DockerHub. This can be confirmed by
running command kubectl get pods, and then and later execute kubectl describe pod podname.
You will see a message similar to the following:
Failed to pull image "hub.docker.io/hercules/search-engine:master" net/http: request
canceled
Solution: The pod will need to be deleted to re-trigger a download if the image.
– Execute the command: kubectl delete pod failing-podname
– This will terminate failing pod and create a new one with different name.
– Make sure image pull is successful by running the command kubectl get pods to see the status of
newly recreated pod.
49
Multiple Kafka Crashes/Restarts
If data is not removed from a machine prior to re-installation, and the Kafka cluster has been reconfigured,
then Kafka brokers may launch with duplicate IDs, causing one of the Kafka nodes to fail to start.
To identify the issue: Look for the following log in one of the Kafka nodes.
2017-04-09 14:56:06,772] FATAL [Kafka Server 1001], Fatal error during KafkaServer startup. Prepare to shutdown
(kafka.server.KafkaServer)
To verify the issue: Connect to each system that is running a Kafka broker and check the assigned broker.id
value of each. The broker.id value defined on each Kafka node must be unique.
# ssh worker_node_1 cat /opt/arcsight/k8s-hostpath-volume/eb/kafka/meta.properties | grep id
broker.id=1001
broker.id=1001
broker.id=1002
To recover: If you are reinstalling the cluster delete the existing data directory /opt/arcsight/ as part of
uninstalling the original install. If you are re-labeling or updating an existing cluster make sure the cluster
labels match the original worker node for each Kafka node without conflicts.
50
Troubleshooting-
Other
51
Event Broker and Vertica Diagnosis scripts
Diagnostic tools are packaged in the Event Broker “Web Service” container that extract logs and other cluster information
that can be used to investigate issues.
# find web service container
$ docker ps | grep -i atlas_web-service
c226ee041c48 hub.docker.hpecorp.net/hercules/atlas_web-service:latest
52
Cannot Query Zookeeper
Symptom:
When you run the kubectl get pods command to get status of the pods and you see that downstream
pods (see the pod dependency order) do not stay up and the status is a 'CrashLoop'-type error.
Conditions to look for:
Check that zookeeper pods are running.
– If the zookeeper pod status is Pending, you may not have labeled the nodes (zk=yes). Verify that the
nodes are labeled using the kubectl get nodes -L=zk command.
– Verify that you configured an odd number of zookeepers in installer.properties
predeploy.eb.zookeeper.count attribute.
– Check the zookeeper pod logs for errors using the kubectl logs <pod name>.
53
Common Errors/Warnings in Zookeeper Logs
– Quorum Exceptions: Cannot elect a leader. If you see this type of error, check the conditions described in
‘Cannot query zookeeper’
– Socket errors: This can occur if there are too many connections.
– The solution is to restart the pod using the kubectl delete <pod_name>.
– The pod will be recreated automatically.
54
Communication Errors
– SSL Connection Errors: These warnings occur if there is a connection issue between Kafka and a
consumer or producer. Check the steps that you used to import certificates to both EB and consumers.
– Communicate between brokers: If you see this type of error, host names may not be configured properly. It
is possible that the node cannot perform reverse look up or that DNS is not set up properly.
55
A consumer cannot read events from EB
– If this is a new set up of Kafka scheduler, check that Kafka scheduler is configured to communicate to port
39092.
– If this was working at first, but stopped working, it is possible that the offset value is not recognized:
– In this scenario, the kafka scheduler fails to recognize offset ids of messages that are in the topic. It can happen if the
kafka scheduler unexpectedly stops reading from the topic, and then is restarted.
– Solution: execute the kafka_scheduler delete command to delete the meta data. After doing this, immediately run the
kafka_scheduler create command to set up the scheduler.
– Other items to check:
– Check the network connection.
– Check whether the Kafka pods are down.
– Check that you configured the consumer to communicate to all nodes running Kafka. If you specified a connection to
only one node in the cluster and that node down, events will not flow.
– If you are encountering SSL connection errors as well, check the steps that you used to import certificates to both EB
and consumers.
56
An EB component crashes: web service, stream processors, etc.
– If this happens at start up, check the container dependency order. Have any of the dependency pods not
started or have crashed?
– Check Memory: Does the system have enough memory and disk space. It is possible that the system
requires more memory that the system has available.
– Check whether there are too many open sockets.
57
Pods will not start after node is shut down for more than 6 hours
After a system has been down for more than 6 hours. The issue is related to a timed-out certificate. If nodes
are down more than 6 hours certificates are not renewed. The work around:
1. Connect to the master node.
2. Run the update_kubevaulttoken script.
# /opt/arcsight/kubernetes/bin/update_kubevaulttoken
5. Check the status of event broker pods. They should restart automatically.
# kubectl get pod -o wide
6. If they do not come up, then undeploy and then redeploy EB using the ArcSight Installer.
58
Event Broker EPS is lower than expected
– Check whether there are resource constraints on brokers: CPU, memory, disk is full. Check usage at
system level or with ArcMC.
– Check for a network bottleneck.
– Check whether Stream Processor is able to keep up with CEF to AVRO transformation.
In ArcMC, the Stream Processor metric will be lower than the Connector EPS. Stream Processor may be
constrained in some way, such as limited system resources.
59
60
Performance
Deployment resource sizing is an important factor in Event Broker performance.
The following slides are from the Event Broker Sizing guidelines on iRock https://
irock.jiveon.com/docs/DOC-141395
61
Performance
– Try to size so that consumption matches consumption for your SLOWEST Consumer. It’s much better to
have an idle Consumer than one that cannot keep up and must constantly keep the Broker reading from
physical disk.
– Throughput limited by broker network and disk bandwidth. Keep as much as possible in memory and
consider production AND consumption bandwidth – it can get very big very quickly!
– Brokers can converge hundreds of producers into a single topic – allows for great SmartConnector scaling
(eg WINC/WUC)
– Latency is a key factor – do not attempt to Produce or Consume over a WAN link such as between data
centers. Consider separate clusters in each data center and use SmartConnectors to perform dual
destination feeds if required.
– Acknowledgement mode will cause a performance hit – consider this when considering required
throughput. Refer to the Sizing Guide on iRock for detailed examples
https://irock.jiveon.com/docs/DOC-141395
62
Performance
Notes on potential bottlenecks
– Think about production + consumption EPS. So you may have 10K EPS inbound, yet Hadoop
AND Logger both consuming (20K EPS consumption). So you need to size for 30K EPS.
– Hardware is PER NODE for a minimum 3 node cluster.
– This assumes NO ACK and NO TLS.
– Leader ACK, include a 66% performance impact. FULL ACK is even worse!
– Assumed 1765 byte CEF events. This is obviously a fluid value.
– Keep in mind that compression in KAFKA is performed on the Producer (eg the Smart
Connector) using GZIP. KAFKA itself plays no role in compression of data.
– But this becomes far more complicated with dealing with BINARY and AVRO for ESM and Vertica!
– Always recommend 10Gbit network connections INSIDE of the cluster!
https://irock.jiveon.com/docs/DOC-141395
63
Back Up
64
Pre-requisites – Vertica Systems
65
More Information
66
Thank you