Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

•AVID INTERPLAY MAINTENANCE

•Interplay Daily Maintenance Check List

•Task For more information

•Avid 1SIS
 See "Avid ISIS Recommended Maintenance" on

page 315. AirSpeed Multi Stream Playout Servers

 Check Dashboard for "Warnings" or "Alerts." Clear and protectAvid AirSpeed Multi Stream
•inventory of materials as required for daily operation. Installation and User's Guide

Infartrend RAID disk set (for Interplay Engine Cluster) See the Interplay Engine
Failover Guide.
 No reboot maintenance is required for the Infortrend RAID
disk set. RAID disk sets such as the Infortrend are designed for
100% uptime and generally do not benefit from being power cycled.

Interplay Media Indexer


 Use the Avid Service Framework Health Monitor to check memory and Interplay Production Best
•CPU usage. If the available memory falls below 10% (red cone inPractices
Health Monitor), the Interplay Media Indexer stops indexing new files.

 Check the Health Monitor to make sure the number of "Table


adapter listeners" is consistent with the number of actual clients
connected to the Interplay Media Indexer. This number should be
close to the actual number of machines connected to the Interplay
Media Indexer. If this number keeps growing and becomes greater
than the number of clients by an order of magnitude, call Avid
Technical Support and schedule a restan for the Interplay Media
Indexer.

•Avid Editing Applications

 Restan all Avid editing client systems including Media


Composer, NewsCutter, Symphony, Interplay Assist, and Avid
Instinct.
•Avid recommends that you reboot the editing systems at the start or
end of an editing shirtisession. This allow the units to perforen power
on self-checks to verify low level hardware health.
•I n t e r p l a y We e kl y M a i n t e n a n c e C h e c k Li s t

•Task For more information

•Avid ISIS
 See "Avid ISIS Recommended Maintenance" on page 315

Interplay Engine and Interplay Archive Engine

I Ensure that individual ingest folders do not contain more than 5,000 objects See "Files Per Folder
•each.Limilations" on page 262.

 Delete older AvDeletes log files from the following folder:


•WnterplayEngineNamelaTrogram FileslAvichAvid Interplay
Engine\Logs\MachinesVnterplayEnginelVartre

•Interplay Transfer Server

 Inspect the Transfer Engine internal dist drive for minimal normal free Interplay Transfer Setup
•space. If the discs are reporting higher than normal use, inspect for the and User 's Guide
•presence of large logging or other error reporting files with recent creation
dates, Report any unusual findings to the site management for follow on
activities.
•Starting in Interplay v3.0, Interplay Transfer no longer requires a
weekly reboot. A monthly reboot is sufficient. This is because Interplay
Transfer v3.x creates a separate process for each playback, ingest, or
DET job. If there is a problem with one job it °ni), affects that particular
job and does not affect the Interplay Transfer Engine.

•Interplay Production Services

 Purge all jobs. To purge jobs, use the Interplay Production Services and Interplay Production
•Transfer Status window. Services Setup and User 's
•Avid recommends that you tura on Auto Purge so you don't have to Guide man
ually purge jobs.
•Starting in Interplay v3.x you do not need to restan the Interplay
Production servers evo.), week. Once a month is sufficient.
•Interplay Media Indexer

•Interplay Weekly Maintenance Check List (Continued)

•Task For more information

 Rebalance the Interplay Media Indexer configuration and'or storages Lo Interplay Production
make sure sufficient indexing mernory is available for the Interplay Media Software Instailation and
•Indexer. Configuration Guide
•Use the Interplay Media Indexer Web interface or A vid Service
Framework Service Configuration toel to check and modify the Interplay
Media Indexer configuration.

•I n t e r p l a y M o n t h l y M a i n t e n a n c e C h e c k Li s t

•Task For more information

•Interplay Engine

 Check the ratio between the number of database pages and the number of See "Detennining
objects in the database. This value is automatically calculated and displayed Interplay Database
•in the Interplay Administrator tool.Scalability" on page 171.
•Note that the Interplay Engine does not need to be restarted except for the
following reasons:
 To verify system health before an upgrade. See the section "Best
Practices for Performing an Engine Upgrade" in the Interplay
Production Readme.
 Testing the failover capability on a cluster system during regular
company maintenance windows (for example, twice a year).

•Interplay Transfer Server

 Delete the following Interplay Transfer temp files from the C: drive: Interplay Transfer Setup
 My Computer>Local Disk C:\tempand User's Guide

•• My Computer>Local Disk C:Imp


•The reason for files lett in the Transfer Engine temporary directories may be
due to failed transfer sessions.

 Delete the following Interplay Transfer server log files from the C:
drive: My Computer>Local Disk C:\ProgramData\Avid\Temp\TMServerLog

 Reboot Interplay Transfer server CPUs. To do this, stop the


Interplay Transfer Engine application and then reboot the CPU.

 Restan Interplay Transfer Engine application from the desktop shortcut.


•Interplay Production Services
 Restad the Production Services server and Provider systems, including interplay Production
•Transcode Provider, and Archive Provider.Services Setup and User's
•Do not shut down the Production Services server whi e jobs are inGuide
• progress. It is fine to shut down the server while jobs are pending.
They wilt restart after the server is restarted.

•Avid ISIS

 See "Avid ISIS Recommended Maintenance" on page 315

•Avid Service Framework

 Use the Health Monitor to check memory and CPU usage of server-side Avid Service Framework
•Framework services (Lookup, System Configuration, Time Sync Master, User's Guide
•Email). Check to make sure that none have memory that is increasing at
an unusual rate and that none have persistently high CPU usage.
•Avid ISIS Recommended Maintenance
•Typically, the Avid ISIS does not need tu be power cycled. All components of the Avid ISIS
stack can be individually replaced or restarted without interfering with the production of Avid
ISIS stack.

•\ Power cycling the ent -ire stack (all the components at the same time) could risk the stability
•W• of the Avid ISIS stack.
•For a detai]ed description of the maintenance procedures for Avid ISIS, see the Avid ISIS 7000 Setup
Guide ,Avid ISIS 5000 Setup Guide, or the Avid ISIS 2000 Setup Guide.
•Recommended Maintenance
•Avid recommends setting up a Daily, Weekly, and Monthly maintenance routine for your ISIS
system. The following is an outline of each schedule. For full details see "Avid ISIS Recommended
Maintenance" portion of the setup guide.

•NOTE: If you encounter an unexpected condition, consult the appropriate guides (release
software versions specific) before execution of any corrective measu res to ensure the
•protection and integrity of the shared storage
data. •011:47:44

•Daily Maintenance (approx. 15min) •1 Of0212912 10371 5

•5.27 TB12%
 Check the Storage Manager/Elements Status in
•1System 43E1113
Admin web page. All Storage Managers and Storage
•W7.24 GBUsedO%
Elements should be green, investigate any error
•STATUS
statuses.
SYSTEM

 Check the Status side bar for messages. •PERFORMANCE

 Log into the active and Sta ndby System Directors and open BAHM'inDTH

the System Director-1s Control Panel, Indicators in the System •l A e g a bl e s p e r s e c on d

Director Status should be green or blue (no red or yellow). Check


•21 O o
the Metadata Status is green. The date stamp should be current.
 Check that the Avid ISIS Workspaces have "Free Space"
available.
 Check that your total system used space is below
85%. (for optimum performance) •Total Bamdwd-th

•ReadO Mees
 Check the system event logs on the System Director for •%IdaO Meps

recent error events.

•Weekly Maintenance (approx. 30min)


 Open a command window and pi ng your switch. The switch should beir~ and a bl e
to access any client on the switch.
 Ping the System Director and each Engine.
 Ping the default gateway if on a corporate network from the System Director. This
should be accessible from any point in the network.
 Review Windows Event logs on the Storage Manager Engines.
 For Interplay supporting systems, inspect the folders to ensure that there are none
exceeding 10,000 files.
 Inspect the Active and Standby System Director hosts for free space of the local disk vol
umes (C: & D j and remove any superfluous files (lett over metadata copies from a recovery
or migration event for example).
 Inspect the Active and Standby System Director hosts for any new MEMORY.DMP
files that ~Id be indicative of an Operating system or application crash event.
 Inspect the clocks on the Active and Standby System Director hosts to confirm alignment
to the site's time synchronization time master(s).
 Ensure alignment to the site's time synchronization time master(s).
 Inspect the disk section of the Storage Manager web agent for excessive long 10s. (link)
•MEDIA CENTRAL MAINTENANCE
•Complete the following steps for a cluster configuration:
1. Log in to Linux as the "root" user on the cluster master node.
2. Run the "crin mon -fi" command to display the cluster resources and status.
•The "-fi" switch tells the Cluster Resource Monitor to display the output with failures only
once.
1. Verify that there are no messages or errors under "Migration sutrumary". A healthy cluster
should be free of messages and errors.

•For more information, see "Interpreting Failures in the Cluster" en page 84.

•Weekly Maintenance
•The following procedures should be completed on a weekly basis and should take approximately 2
minutes per server or node. These steps can be completed on a "live", in-production system.

•Complete the following steps for a single-server configuration:


1. Complete ad single-server Daily Maintenance procedures.
2. Run the "df -h" command to confirm that all expected partitions, volumes, and storage(s)
are mounted and accessible.
•As you revisit this process over the course of multiple weeks, also monitor the "Use%" for all
volumes for changes. Verify that no volumes near 100% and that no volume suddenly
Mercases the Use%. The root (1) directory is of particular importante. Changes should
generally be gradual and not drastic.
•For more information, see "Verifying System Mount Points" on page 117.
1. Check system mernory usage with the "free" cornmand:
free -rn
•MCS memory usage can vary from one installation to another based on the site's workflow. As
you revisit this process over the course of multiple weeks, monitor the mernory usage to
determine your sites average values.
•For more information, see "Investigating Memory Usage" on page 118.
1. Verify time synchronization with the NTP server.
•For more information, see "Troubleshooting Time Synchronization" on page 121.

•Complete the following steps for a cluster configuration.


1. Complete all cluster Daily Maintenance procedures.
2. Run the "df -h" command to confirm that all expected partitions, volumes, and storage(s)
are mounted and accessible.
•This command should be ron on each cluster node.
•As you revisit this process over the course of mu]tiple weeks, also monitor the "Use%" for
all volumes for changes. Verify that no volumes near 100% and that no volume suddenly
increases the Use%. The root (1), DRBD (Imntldrbd), and Gluster (/cache/) directories
are of particular importance. Changes should generally be gradual and not drastic.
•For more information, see "Verifying System Mount Points" on page 117.
3. Run the "drbd-overview" command on the master and slave nodes to confirm their
Connected status. The master nodes should be listed as Primary and the slave as Secondary.
•For more information, see "Verifying the DRBD Status" on page 72.
1. Check the DRBD volume' s "max mount count" on the master node by nmning the
following command:
•tune2fs -1 idev/drbd1 1 grep -E 'MountiMax'
•If the "Mount count" is greater than the "Maximum mount count", see the related article on
the Avid Knowledge Base for more information and a process to resolve the issue.
1. Check system memory usage with the "free" command:
free -m
•MCS memory usage can vary from one installation to another based on the site's workflow.
As you revisit this process over the course of mu]tiple weeks, monitor the memory usage to
determine your sites average values.
•For more information, see "Investigating Memory Usage" on page 118.
1. Verify time synchronization with the NTP server.
•For more information, see "Troubleshooting Time Synchronization" on page 121.

•Monthly Maintenance

•The following procedures should be completed on a monthly basis and should take
•approximately 5 minutes per server. These steps can be completed on a "live", in-production
system.

•Throughout the process, watch for any warning messages or errors. If any issues are
encountered, investigate and resolve them prior to releasing the system for use. If needed,
contact Avid Customer Care at 800-800-AVID (2843) for assistance.

•There are no monthly maintenance recomrnendations for a single-server configuration at this


time.
•Complete the following steps for a cluster configuration:
1. Complete all cluster Daily Maintenance and Weekly Maintenance procedures.
2. Run the "rabbitmgctl luster_status" command to verify the status of the RabbitMQ
cluster. Repeat this command on all cluster nodes.
•For more information, see "Verifying the Status of RabbitMQ" on page 71.
1. Run the "a cs-que r y" command to check the health of the ACS bus. Repeat this command
on all cluster nodes.
•For more information, see "Verifying ACS Bus Functionality" on page 74.
1. If your system has been configured with Gluster for cache volume replication, verify that all
Gluster peers are known:
•gluster peer status

•A two node-cluster will report information similar to the following:


•[root@wavd-mcsel -]* gluster peer status
•Number of Peers: 1
•Hostname: wavd-mcsn
•Uuid: e54a6d62-fhhb-421b-b44f-33a1e2g7a297
•State: Peer in Cluster (Connected)

•Confirm that all nades, excluding the local node, are "(Connected)".

1. If your system has been configured with Gluster for cache volume replication, use the
following command te verify that all Gluster volumes or "bricks" are mounted on each of
the cluster nodes:
•gluster volume info

•The following is an example of the "gl-cache-dr volume in a two-node cluster:


•Volume Name: gl-cache-dl
•Type: Replicate
•Volume ID: 159e1427-6bba-4956-921:30-c1e54a377793
•Status: Started
•Number of Bricks: 1 x 2 = 2
•Transport-type: tcp
•Bricks:
•Brickl: wavd-mcs01:icacheigluster/glusterdatadownload
•Brick2: wavd-mcs02:/cache/gluster/glusterdatadownload
•Options Reconfigured:
•storage owner-nid: 497
•storage.owner-gid: 497

•Each cluster node should be listed with a Brick# entry under the "Bricks" section.
•6. Manually verify that Gluster is replicating data across the cluster nodes.
a. From any cluster node, create test files on the following Gluster shares:
•touch /cache/downloaditest001.txt
touch /cache/fi cacheitest002.txt

a. Verify that the files created on your local system are replicated to all other cluster
nodes. This can be accomplished by either Opening an SSH session to each cluster node or
you can use the Linux ssh command to verify the file replication from your current node:
•ssh root@<nade> ls <folderpath>

•Where <nade> is the hostname of the remate server and <f olderpath> is the location of
the test file.
•You might be prompted to confirm that you wish to connect to the remote system. Enter
"yes" to continue. You will also be prompted for the "root" user password of the remote
system.
a. Once you have ven-ad that file replication is functioning normally, remove the test files
from the ¡cache directory:
•rm /cache/downloaditest001.txt && rm /cache/f1 cacheitest002.txt

•You will be asked to confirm that you with to remove the files. Type: yes
•The local and replicated copies of the files are deleted.

•Common Troubleshooting Commands


•The following table lists some helpful commands for general troubleshooting:

•Command Description

•acs-query Tests the RabbitMQ message bus and the avid-acs-clrl-core


•service. With MediaCentral 2.5 and higher, this command
also tests the avid-acs-gateway service.
•avid-db dumpallBacks up the MCS databases

•corosync-cfgtool -sRetums the IP and other stats for the node on which you issue the
•(cluser only) command.

•corosync-objctl 1 grepRetums the IP addresses of all nodes in the cluster.


•member

•(cluser only)
•Command Description

•crm Launches the Pacemaker Cluster Resource Manager in a shell


•(cluster only) mode.
•Once in the cnn shell, tab twice for a list of options at each level.
Type help for a list of commands. Press q to exit the help file.
Hit CTRL-C on a Windows keyboard to exit the cnn shell.

•crm_mon [- •Opens the Pacemaker Cluster Resource Monitor.


f] (cluster •The -f option displays the fail-count for all services managed by
only) Pace maker.

•If you are accessing the server from a remate SSH session, this
•dmidecode 1 grep -A2 command prints the server information to the screen. Example:
'"System Information'
•System Information
•Manufacture': HP
•Product Name: ProLiant DL360p Gen8
•watch 'crm_mon -fl 1 grep •Depending upon your configuration and the number of managed
-A100 "Migration summary"'
resources, it can be difficult to see afi messages related to the
•(cluster only) cluster when using the crm_mon -f command. This watch
command provides a live status of the last 100 línes of the output
of crm_mon following the "Migration summary".
•The 100 value can be increased or decreased as desired.
•drbd-overview Prints DRBD status information to the screen. This information
•(cluster only)can also be obtained through the following command: service
•drbd status

•gluster Quedes GlusterFS peers. e.g.

•(cluster only) gluster peer [command]

•gluster peer probe

•icsversion Prints MCS version information to the screen.


•Command Description

•ping -c <hostname or IP Verifies the connection to a remate system through a network


• address> "ping" request. The -c option defines the number of
times the
ping is sent.
•When tmubleshooting network issues, it might be useful to add a
time stamp to the ping request so that the ping can be compared
against log files. that command looks like the following:
ps -ae 1 grep [-c] intern ping <hostname or IP address> 1 perl -nle 'print
scalar(localtime), " ", $'

To send the output of a time-stamped ping request te a Ele, use


the following command:

ping <hostname or IP address> 1 perl -nle 'BEGIN


4$¡++1 print scalarllocaltime), " ", $_'
<filename>

Press CTRL-C to stop the command and close the file.


For more information on the use ofping, see "Verifying Network
system-backup [-b 1 -r] Connectivity" on page 63.

This command polis the max-edit player and returns information


regarding the connections to the player on the current server.
Example:

[root@wavd-mcsOliii ps -aelgrep intern


105036 pts/0 00:00:49 max-edit-intern
If you have a cluster with multiple servers, this command can
help you verify which nade is servicing a playback request.
Adding the [-c] parameter will give you a count of playback
streams being serviced by the current server.

Offen run prior to an MCS upgrade, this script backs up the


system settings and MCS databases. When run with the -r
option, the script restores the backed-up data.
For more information, see "Appendix of the MediaCentral
Platform Services Upgrade Guide.

You might also like