Access Keys:
Skip to content (Access Key - 0)

Opsview Usage for Managed Server Customer

This article is intended only for customers of MIT's Managed Server Hosting. These customers will have ops-help@mit.edu as the primary contact for all server questions.

Server Operations monitors our servers using Opsview, which is a commercial product which uses Nagios checks internally. Detailed monitoring of functionality is offered as part of the Managed Server offering, and detailed monitoring of server infrastructure is used internally.

Monitoring can be external (one of the Opsview workers makes requests), internal (a daemon runs on the server being monitored and reports state when queried) or passive (cronjob runs on server being monitored and reports state).

Basic Access (Everybody)

Login to Opsview by going to https://opsview-access.mit.edu/ for a touchstone login - you will thereafter see URLs refer to opsview.mit.edu, but going to opsview-access.mit.edu first is required to login. All authenticated users should be able to browse host status; you may find searching for the desired host group by entering 'hg HOSTGROUP' helpful.

In addition to the web interface, the Zephyr class "nagios" gets real time notifications for changes in state.

Paging and Downtime

Depending on both the SLA status of the host, which team owns it, and the service in question, a service switching in/out of WARNING or CRITICAL state will either send email or page a staff member.

Please let us know at ops-help@mit.edu if you are planning to perform maintenance on a 24x7 SLA'd server after hours, and we'll schedule downtime in advance. If you are simply taking updates on a CMS (involving a brief interval of maintenance mode, but no httpd downtime) you probably don't need to notify us in advance.

Monitoring Coverage

Server Operations monitors (using a nightly cronjob) for hosts/services that are on our networks or in our inventories that aren't monitored by Opsview. We aim to monitor all configured SNI vhosts and all ports that are expected to be open. Maintaining this is an ongoing project.

We don't necessarily detect deep functionality - in some cases the basic monitoring will simply detect http response, rather than underlying functionality. Likewise, some checks exist that are more sensitive than is strictly necessary; checks that assess a version or configuration status (such as VMWARE-TOOLS or PUPPET-LAST-RUN) will sometimes be in a WARNING or CRITICAL state without harm.

Custom Checks

We have a flexible set of tools for doing deep monitoring of most services. Please ask us to set up what you need.

Here are some examples:

HTTP/HTTPS

  • We can hit a deep URL and search for a particular string on the page
  • HTTPS checks will present opuser@mit.edu's personal certificate
  • Checks that report success on any HTTP response (including 400 errors) are available
  • Checks that report failure sluggishly (suitable for a dependency on an unavoidably slow third party service) are available

PROCS

  • Look for a process to exist, or exist at a particular count; can be used to detect failure of a service that doesn't present an externally testable interface, can be used to detect a daily process that may queue up too many copies if it hangs.

PASS-LOGFILES

  • Look for a message in a file written to disk. See also (other options for monitoring logs). This is most appropriate for when a crashing process needs to page a staff member, and has drawbacks as a general logging tool.

Custom Notifications

If you'd like to receive notifications in email about your managed server, please let us know and we'll set this up for you. The "notification profile" required is manually maintained, so we would generally prefer to notify a single mailing list for each group of servers, rather than a separate address per server.

IS&T Contributions

Documentation and information provided by IS&T staff members


Last Modified:

May 20, 2019

Get Help

Request help
from the Help Desk
Report a security incident
to the Security Team
Labels:
c-managed-server c-managed-server Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
Feedback
This product/service is:
Easy to use
Average
Difficult to use

This article is:
Helpful
Inaccurate
Obsolete
Adaptavist Theme Builder (4.2.3) Powered by Atlassian Confluence 3.5.13, the Enterprise Wiki