Professional Documents
Culture Documents
Maddash: Monitoring and Debugging Dashboard: John Hess, Cenic
Maddash: Monitoring and Debugging Dashboard: John Hess, Cenic
● Raises awareness of issues (read: symptoms) which may not have [yet, ever] been
reported
● Provides a baseline performance reference and timeline from which to correlate changes
● Complements ad hoc testing
● Set up pS nodes and DTNs to initiate tests and register results to the MA
● maddash
○ Container package that has dependencies on the maddash-server, maddash-webui, and
perl-perfSONAR_PS-Nagios packages. The package itself does not install any additional
software, it simply pulls in the aforementioned packages.
● maddash-server
○ The backend server that schedules checks and makes results available via a REST/JSON
interface running on an embedded web server. This package has a dependency on java which will
also be installed during the yum installation process.
● maddash-webui
○ The web pages that display the dashboard. It consists of a set of CGI scripts that run under
Apache. The server contacts the REST server run by the maddash-server package and then
presents the data on the web page.
● nagios-plugins-perfsonar
○ Installs the perfSONAR Nagios checks that can alarm based on throughput, loss and other data
returned by perfSONAR services.
● perfsonar-graphs
○ Provides the performance graphs used by the maddash-webui package for perfSONAR checks.
source: http://software.es.net/maddash/install.html
#########################################################################################
# Mesh Config file for CENIC
#
###
<administrator>
name John Hess
email jhess@cenic.org
</administrator>
<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
## SDSC
<organization>
description SDSC
<site>
description SDSC
<host>
description ps10g.sdsc.edu
address ps10g.sdsc.edu
address nate.sdsc.edu
address 192.12.207.22
address 2001:48d0:100:1::22
</host>
</site>
</organization>
<host>
description ps-40g-scidmz-0.tools.ucla.net
address ps-40g-scidmz-0.tools.ucla.net
address 2607:f010:3f9:8004::ea
<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
write_url https://perfsonar.noc.ucla.edu/esmond/perfsonar/archive
</measurement_archive>
</host>
<host>
description speedtest2.pnl.gov
address speedtest2.pnl.gov
no_agent 1
<measurement_archive>
type perfsonarbuoy/bwctl
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type traceroute
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
<measurement_archive>
type perfsonarbuoy/owamp
read_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
write_url https://ps-ma-lax.cenic.net/esmond/perfsonar/archive/
</measurement_archive>
</host>
<test_spec bwctl_8h_tcp_test_v6>
# Define a test spec for testing achievable bandwidth once every 4 hours
type perfsonarbuoy/bwctl # Perform a bwctl test (i.e. achievable bandwidth)
tool bwctl/iperf3 # Use 'iperf' to do the bandwidh test
protocol tcp # Run a TCP bandwidth test
interval 28800 # (21600) Run the test every 6 hours
ipv6_only 1 # force ipv6 only
duration 30 # Perform a 20 second test
force_bidirectional 1 # do bidirectional test
random_start_percentage 25 # randomize start time
omit_interval 5 # ignore first few seconds of test
window_size 134217728 # set 128MB TCP window
</test_spec>
<test_spec bwctl_8h_tcp_test>
# Define a test spec for testing achievable bandwidth once every 4 hours
type perfsonarbuoy/bwctl # Perform a bwctl test (i.e. achievable bandwidth)
tool bwctl/iperf3 # Use 'iperf' to do the bandwidh test
protocol tcp # Run a TCP bandwidth test
interval 28800 # (21600) Run the test every 6 hours
ipv4_only 1 # force ipv4 only
duration 30 # Perform a 20 second test
force_bidirectional 1 # do bidirectional test
random_start_percentage 25 # randomize start time
omit_interval 5 # ignore first few seconds of test
window_size 134217728 # set 128MB TCP window
</test_spec>
##########################################################################################
# Tests
##
<test>
description IPv4 Throughput 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec bwctl_1h_tcp_1G_v4
</test>
<test>
description IPv4 Packet Loss 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec owamp_test
</test>
<test>
description IPv4 Traceroute 1G-connected, Disjoint
group cenic_disjoint_1G_v4
test_spec traceroute_test
</test>
<test>
description IPv6 Throughput 1G-connected, Disjoint
group cenic_disjoint_1G_v6
test_spec bwctl_1h_tcp_1G_v6
</test>
}
}
"title":"CENIC perfSONAR Dashboard",
"defaultDashboard": "CENIC
10G-connected",
"enableAdminUI": true,
"colors": {
0: "green",
1: "yellow",
2: "red",
3: "gray",
4: "black",
5: "orange"
}
}
## Use 'mesh' blocks to to specify each mesh that the agent should configure
## a display for
# #<mesh>
# ## Use 'configuration_url' to specify the URL where the agent should obtain
# ## the mesh configuration from
# configuration_url https://host.domain.edu/example.json
#
# ## To ensure that the configuration is trusted, you can set the
# ## 'validate_certificate' option to 1. This will validate that the certificate
# ## is valid, and matches the hostname. If the 'validate_certificate' option is
# ## set to 1, the 'ca_certificate_file' option must be set.
# #validate_certificate 0
#
# ## The 'ca_certificate_file' specifies which CAs to use to validate the
# ## certificates.
# #ca_certificate_file /etc/pki/tls/bundle.crt # the default RedHat CAs
# #</mesh>
## You can define more meshes to configure against by adding more 'mesh' blocks.
#<mesh>
# configuration_url https://host.otherdomain.edu/mesh.json
# #validate_certificate 0
# #ca_certificate_file /etc/pki/tls/bundle.crt
#</mesh>
● Recipe: http://docs.perfsonar.net/multi_mesh_agent_config.html
● MaDDash Server
○ create dashboard / grids of participating pS nodes
○ set check frequency, threshold values
● pS node configuration
○ update /etc/perfsonar/meshconfig-agent.conf to include new mesh
● Verification
○ pS node: tests added to /etc/perfsonar/meshconfig-agent.tasks
○ pS node: tests are are scheduled pscheduler monitor
○ pS node & MA: results registered
○ MaDDash: checks are finding results and reflecting correctly on dashboard
● Recipe: http://software.es.net/esmond/perfsonar_gridftp.html
● MaDDash Server
○ create dashboard / grids of participating DTNs
○ set check frequency, threshold values
○ if the same GridFTP endpoints are also registering pS (event-type throughput ) results to
the same MA as GridFTP transfer results, update GridFTP-related checks in maddash.yaml
to add the –tool gridftp option to the corresponding Nagios command entries
● DTN configuration
○ create cron jobs: to schedule transfers; and, parse GridFTP transfer log and upload results
PRP::FIONA Workshop Rehearsal 5-6 February, 2018 26
GridFTP transfer data & metadata registered to esmond MA
Esmond breaks information into metadata and data as described in perfSONAR Client REST Interface.
The metadata describes the parameters of the GridFTP transfer. This includes the following (metadata
field names in parentheses):
● The source IP address (source)
● The destination IP address (destination)
● The fact that the tool used was gridftp (tool-name)
● The number of parallel streams (bw-parallel-streams)
● The TCP window size if set (tcp-window-size)
● If file striping is used, the number of stripes (bw-stripes)
● The GridFTP program used such as globus-gridftp-server(gridftp-program)
● The block size used by GridFTP in the transfer(gridftp-block-size)
● If you give the log scraper the -F option, the name of the file transferred (gridftp-file)
● If you give the log scraper the -N option, the name of the user that made the transfer (gridftp-user)
● If you give the log scraper the -V option, the name of the volume used in the transfer (gridftp-volume)
source: http://software.es.net/esmond/perfsonar_gridftp.html#using-the-registered-data