Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

PRP FIONA Workshop

MaDDash: Monitoring and Debugging Dashboard February 5-6, 2018


CENIC, La Mirada

john hess, CENIC


MaDDash - Monitoring and Debugging Dashboard

● Orchestrates regular testing among several hosts (meshconfig)


● Visualizes test results
○ original use-case: displaying measurements from perfSONAR throughput (iperf) and
latency / loss (owamp)
○ python script for parsing GridFTP transfer logs to register disk-to-disk throughput
○ can be extended to display other types of two-dimensional data
○ includes community-contributed traceroute viewer (Dale Carder / U. Wisconsin)
● Gathers test results from Measurement Archive (MAs)
○ MAs co-resident on pS toolkit hosts
○ central, standalone MAs
● Presents high-level dashboard view of grids (meshes) of hosts (full-mesh, disjoint, …)
● Clicking on cells within grids leads to time-series graphs, host information, traceroute
viewer
● Nagios checks may be configured to alarm based on thresholds for throughput, latency,
packet loss, ….

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 2


MaDDash - why regular testing is important

● Raises awareness of issues (read: symptoms) which may not have [yet, ever] been
reported
● Provides a baseline performance reference and timeline from which to correlate changes
● Complements ad hoc testing

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 3


perfSONAR MaDDash: Throughput and Packet Loss grids (disjoint)

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 4


MaDDash - dashboard grids, cells, check decoder

Upper half of cell:


reflects tests from
Row node toward
Column node

Lower half of cell:


reflects tests from
Column node toward
Row node

image source: https://github.com/esnet/maddash/tree/master/docs/images

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 5


perfSONAR MaDDash: Throughput and Packet Loss time-series graphs

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 6


perfSONAR MaDDash: Host Info and Traceroute Viewer

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 7


PRP K8s MaDDash and Traceroute visualization tools

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 8


MaDDash: develop a test plan

● Selection criteria for which hosts will participate


○ campus border
○ Science DMZ (topology close to DTN)
○ regional
● How will the participating hosts be organized into grids
○ host NIC uplink capacity
○ separate grids for IPv4 vs IPv6
● What is going to be measured
○ throughput
○ packet loss / latency
○ traceroute
● Measurement frequency, duration, other parameters to consider
○ throughput (10G: 4x / day @ 30 seconds; 100G: 2x / day @ 30 seconds)
○ packet loss / latency
○ traceroute ( 4x / day)
○ other knobs - IPv4-only or IPv6-only; TCP Window Size; force bi-directional
● Measurement Archive (MA)
○ distributed MAs: each pS toolkit host may (optionally) have an MA
○ central MA: typically on same host / VM / container running MaDDash server
PRP::FIONA Workshop Rehearsal 5-6 February, 2018 9
MaDDash: installation & configuration (during lab)

● Recipes for rpm bundle installation on CentOS:


http://docs.perfsonar.net/install_centos.html

● perfsonar-centralmanagement bundle will install a standalone MaDDash +


esmond Measurement Archive (MA)

● Create and publish a mesh configuration file (json)


○ consumed by MaDDash server for setting up dashboards of groups of nodes
○ consumed by participating pS nodes for initiating regular tests

● Setup MA for receiving results from pS nodes and DTNs

● Configure aspects of the MaDDash webUI

● Configure MaDDash server to display results from pS nodes and DTNs

● Set up pS nodes and DTNs to initiate tests and register results to the MA

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 10


MaDDash: packages

● maddash
○ Container package that has dependencies on the maddash-server, maddash-webui, and
perl-perfSONAR_PS-Nagios packages. The package itself does not install any additional
software, it simply pulls in the aforementioned packages.
● maddash-server
○ The backend server that schedules checks and makes results available via a REST/JSON
interface running on an embedded web server. This package has a dependency on java which will
also be installed during the yum installation process.
● maddash-webui
○ The web pages that display the dashboard. It consists of a set of CGI scripts that run under
Apache. The server contacts the REST server run by the maddash-server package and then
presents the data on the web page.
● nagios-plugins-perfsonar
○ Installs the perfSONAR Nagios checks that can alarm based on throughput, loss and other data
returned by perfSONAR services.
● perfsonar-graphs
○ Provides the performance graphs used by the maddash-webui package for perfSONAR checks.

source: http://software.es.net/maddash/install.html

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 11


MaDDash: configuration files

● dashboard.conf has global defaults and four main sections


○ organization, site, host
○ test specifications (test_spec)
○ groups (organizes hosts into groups: mesh, disjoint)
○ tests (pulls together a test_spec and applies it to a group)
● mesh.json produced by tools processing dashboard.conf
○ consumed by MaDDash GUI agent tools to create server YAML configuration
○ consumed by participating pS nodes to initiate tests
● /etc/maddash/maddash-webui/config.json settings for
○ dashboard title
○ default dashboard
○ color scheme for thresholds
● /etc/perfsonar/meshconfig-guiagent.conf settings for
○ check interval, time range
○ performance threshold values for acceptable, warning, critical
● /etc/maddash/maddash-server/maddash.yaml generated from mesh.json
○ depending on use-case may require manual configuration (e.g. GridFTP)

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 12


CENIC-dashboard.conf -- global defaults

#########################################################################################
# Mesh Config file for CENIC
#
###

# top level / global defaults


description CENIC

<administrator>
name John Hess
email jhess@cenic.org
</administrator>

<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 13


CENIC-dashboard.conf -- organization, site, host

## SDSC
<organization>
description SDSC

<site>
description SDSC
<host>
description ps10g.sdsc.edu
address ps10g.sdsc.edu
address nate.sdsc.edu
address 192.12.207.22
address 2001:48d0:100:1::22
</host>
</site>

</organization>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 14


CENIC-dashboard.conf -- host-specific MAs

<host>
description ps-40g-scidmz-0.tools.ucla.net
address ps-40g-scidmz-0.tools.ucla.net
address 2607:f010:3f9:8004::ea
<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
</host>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 15


CENIC-dashboard.conf -- no_agent hosts

<host>
description speedtest2.pnl.gov
address speedtest2.pnl.gov
no_agent 1
<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
</host>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 16


CENIC-dashboard.conf -- test specifications (test_spec)

<test_spec bwctl_8h_tcp_test_v6>
# Define a test spec for testing achievable bandwidth once every 4 hours
type perfsonarbuoy/bwctl # Perform a bwctl test (i.e. achievable bandwidth)
tool bwctl/iperf3 # Use 'iperf' to do the bandwidh test
protocol tcp # Run a TCP bandwidth test
interval 28800 # (21600) Run the test every 6 hours
ipv6_only 1 # force ipv6 only
duration 30 # Perform a 20 second test
force_bidirectional 1 # do bidirectional test
random_start_percentage 25 # randomize start time
omit_interval 5 # ignore first few seconds of test
window_size 134217728 # set 128MB TCP window
</test_spec>

<test_spec bwctl_8h_tcp_test>
# Define a test spec for testing achievable bandwidth once every 4 hours
type perfsonarbuoy/bwctl # Perform a bwctl test (i.e. achievable bandwidth)
tool bwctl/iperf3 # Use 'iperf' to do the bandwidh test
protocol tcp # Run a TCP bandwidth test
interval 28800 # (21600) Run the test every 6 hours
ipv4_only 1 # force ipv4 only
duration 30 # Perform a 20 second test
force_bidirectional 1 # do bidirectional test
random_start_percentage 25 # randomize start time
omit_interval 5 # ignore first few seconds of test
window_size 134217728 # set 128MB TCP window
</test_spec>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 17


CENIC-dashboard.conf -- groups: mesh and disjoint <group cenic_disjoint_throughput_10G_v4>
type disjoint

<group cenic_bwctl_100G_v4> a_member ps-svl-10g.cenic.net


type mesh a_member ps-lax-10g.cenic.net

member perfsonar.nersc.gov b_member perf-scidmz-data.cac.washington.edu


member fiona-ps.ucsc.edu b_member cc-bonsai-perfsonar.bonsai.uoregon.edu
member ps-100g-hpr01.stanford.edu b_member melange.noc.ucdavis.edu
member ps-100g-scidmz-0.tools.ucla.net b_member ps-border1-pt.lbl.gov
member ps-100g.sdsu.edu b_member ucsf-perfsonar1.ucsf.edu
</group> b_member ps-arc-meter.nren.nasa.gov
b_member ps-10g-hpr01.stanford.edu
<group cenic_bwctl_40G_v6> b_member dps10.ucsc.edu
type mesh b_member ucm-perfsonar00.ucmerced.edu
b_member ps-prp-10g.noc.ucsb.edu
member bost-pt2-v6.es.net b_member perfsonar.csusb.edu
member fiona-ps.net.berkeley.edu b_member perfsonar.ultralight.org
member ps-40g-hpr01-v6.stanford.edu b_member perfsonar.caltech.edu
member ps-antl-meter-40g-v6.nren.nasa.gov b_member hpc-perfsonar.usc.edu
member fiona.ucsc.edu b_member ps-bw.ln.net
member ps-40g-scidmz-0.tools.ucla.net b_member ps10g-asm2.tools.ucla.net
member perf-main-40.ucr.edu b_member perf-main.ucr.edu
member ps-40g-prism.calit2.optiputer.net b_member fiona-ps.lp.ucinet.uci.edu
member ps-40g-v6.sdsu.edu b_member speedtest.ucsd.edu
</group> b_member ps10g.sdsc.edu
b_member ps-10g-prism.calit2.optiputer.net
b_member perfsonar.sdsu.edu
</group>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 18


CENIC-dashboard.conf -- tests

##########################################################################################
# Tests
##
<test>
description IPv4 Throughput 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec bwctl_1h_tcp_1G_v4
</test>

<test>
description IPv4 Packet Loss 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec owamp_test
</test>

<test>
description IPv4 Traceroute 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec traceroute_test
</test>

<test>
description IPv6 Throughput 1G-connected, Disjoint
group cenic_disjoint_1G_v6
test_spec bwctl_1h_tcp_1G_v6
</test>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 19


ps-dashboard.cenic.net:/etc/maddash/maddash-webui/config.json

}
}
"title":"CENIC perfSONAR Dashboard",
"defaultDashboard": "CENIC
10G-connected",
"enableAdminUI": true,
"colors": {
0: "green",
1: "yellow",
2: "red",
3: "gray",
4: "black",
5: "orange"
}
}

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 20


ps-dashboard.cenic.net:/etc/perfsonar/meshconfig-guiagent.conf

## Use 'mesh' blocks to to specify each mesh that the agent should configure
## a display for
# #<mesh>
# ## Use 'configuration_url' to specify the URL where the agent should obtain
# ## the mesh configuration from
# configuration_url https://host.domain.edu/example.json
#
# ## To ensure that the configuration is trusted, you can set the
# ## 'validate_certificate' option to 1. This will validate that the certificate
# ## is valid, and matches the hostname. If the 'validate_certificate' option is
# ## set to 1, the 'ca_certificate_file' option must be set.
# #validate_certificate 0
#
# ## The 'ca_certificate_file' specifies which CAs to use to validate the
# ## certificates.
# #ca_certificate_file /etc/pki/tls/bundle.crt # the default RedHat CAs
# #</mesh>

## You can define more meshes to configure against by adding more 'mesh' blocks.
#<mesh>
# configuration_url https://host.otherdomain.edu/mesh.json
# #validate_certificate 0
# #ca_certificate_file /etc/pki/tls/bundle.crt
#</mesh>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 21


ps-dashboard.cenic.net:/etc/perfsonar/meshconfig-guiagent.conf (continued)

# The default maddash test configurations. If the values aren't specified in


# the mesh, these will be used.
<maddash_options>
<perfsonarbuoy/owamp>
check_command /usr/lib/nagios/plugins/check_owdelay.pl
check_interval 1800
check_time_range 900
acceptable_loss_rate 0
critical_loss_rate 0.01
retry_attempts 1
</owamp>
<perfsonarbuoy/bwctl>
check_command /usr/lib/nagios/plugins/check_throughput.pl
check_interval 28800
check_time_range 86400
acceptable_throughput 900
critical_throughput 500
</bwctl>
</maddash_options>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 22


ps-dashboard.cenic.net:/etc/perfsonar/meshconfig-guiagent.conf (continued)
# The default maddash test configurations. If the values aren't specified in
# the mesh, these will be used.
<maddash_options>
<perfsonarbuoy/owamp>
check_command /usr/lib/nagios/plugins/check_owdelay.pl
check_interval 1800
check_time_range 900
acceptable_loss_rate 0
critical_loss_rate 0.01
retry_attempts 1
</owamp>
<perfsonarbuoy/bwctl>
check_command /usr/lib64/nagios/plugins/check_throughput.pl
check_interval 28800
check_time_range 86400
acceptable_throughput 7500
critical_throughput 5000
</bwctl>
<perfsonarbuoy/bwctl>
grid_name CENIC - IPv4 Throughput 100G-connected
grid_name CENIC - IPv6 Throughput 100G-connected
check_command /usr/lib64/nagios/plugins/check_throughput.pl
check_interval 14400
check_time_range 86400
acceptable_throughput 75000
critical_throughput 50000
</bwctl>
</maddash_options>

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 23


MaDDash: Measurement Archives

● esmond MA included in perfsonar-core and perfsonar-toolkit installation bundles


● packaged with MaDDash as a independent bundle, perfsonar-centralmanagement,
which also includes the tools to produce the mesh configuration (json)

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 24


MaDDash: registering perfSONAR results (during lab)

● Recipe: http://docs.perfsonar.net/multi_mesh_agent_config.html

● Measurement Archive (MA)


○ generate credentials (username, and IP addresses or API key) for participating pS nodes

● MaDDash Server
○ create dashboard / grids of participating pS nodes
○ set check frequency, threshold values

● pS node configuration
○ update /etc/perfsonar/meshconfig-agent.conf to include new mesh

● Verification
○ pS node: tests added to /etc/perfsonar/meshconfig-agent.tasks
○ pS node: tests are are scheduled pscheduler monitor
○ pS node & MA: results registered
○ MaDDash: checks are finding results and reflecting correctly on dashboard

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 25


MaDDash: registering GridFTP results (during lab)

● Recipe: http://software.es.net/esmond/perfsonar_gridftp.html

● DTNs - software requirements


○ GridFTP server v6.0 or newer - https://fasterdata.es.net/data-transfer-tools/gridftp/
○ python 2.7
○ esmond-client Python package (includes esmond-ps-load-gridftp script)

● Measurement Archive (MA)


○ generate credentials (username and API key) to use when script on DTNs post results

● MaDDash Server
○ create dashboard / grids of participating DTNs
○ set check frequency, threshold values
○ if the same GridFTP endpoints are also registering pS (event-type throughput ) results to
the same MA as GridFTP transfer results, update GridFTP-related checks in maddash.yaml
to add the –tool gridftp option to the corresponding Nagios command entries

● DTN configuration
○ create cron jobs: to schedule transfers; and, parse GridFTP transfer log and upload results
PRP::FIONA Workshop Rehearsal 5-6 February, 2018 26
GridFTP transfer data & metadata registered to esmond MA

Esmond breaks information into metadata and data as described in perfSONAR Client REST Interface.
The metadata describes the parameters of the GridFTP transfer. This includes the following (metadata
field names in parentheses):
● The source IP address (source)
● The destination IP address (destination)
● The fact that the tool used was gridftp (tool-name)
● The number of parallel streams (bw-parallel-streams)
● The TCP window size if set (tcp-window-size)
● If file striping is used, the number of stripes (bw-stripes)
● The GridFTP program used such as globus-gridftp-server(gridftp-program)
● The block size used by GridFTP in the transfer(gridftp-block-size)
● If you give the log scraper the -F option, the name of the file transferred (gridftp-file)
● If you give the log scraper the -N option, the name of the user that made the transfer (gridftp-user)
● If you give the log scraper the -V option, the name of the volume used in the transfer (gridftp-volume)

source: http://software.es.net/esmond/perfsonar_gridftp.html#using-the-registered-data

PRP::FIONA Workshop Rehearsal 5-6 February, 2018 27

You might also like