Ping Scripts

This website demonstrates simple scripts that monitor servers by pinging them. They are designed to illustrate different strategies for monitoring, and can be used as a starting point for many situations.

Note: This page was written in 2003 and is no longer maintained (moved to the Crypt). I'll keep it online as I've had emails over the years to say it has helped people out.

Contents

Monitoring

check What we can check
present How to present it
os Operating Systems

Screenshots and Downloads

strategy1 Command Line Interface (CLI)
strategy2 Website Report
strategy3 CGI Report
strategy4 Multi daemon

Monitoring

Check

The action that checks the service. The check may be to:

ping the server
connect to a port
simulate an end-user event
monitor performance

The most basic test is to use the ping tool to check if a host is alive. It sends ICMP echo requests and listens for ICMP echo responses. This lets you know that the host address has resolved (if a DNS name was used), and a host with that address has responded. What happens in detail on the remote host is this:

The host's kernel has been interrupted by a network device, pulled up a packet, which is then processed up the kernel's network stack to the ICMP routines, and then an ICMP echo response is immediately returned. The pull-up and response may be queued by the low level network stack subsystem (eg, soft ring buffers). User-land - where application processes execute - isn't reached at all. And depending on the OS, only a small portion of the kernel (lower level network, IP and ICMP) is involved.

This means that you haven't actually checked much about the health of the operating system, and haven't checked the health of running applications at all. That's not to say ICMP-based testing is useless: it's very useful data in conjunction with other tests.

The next test would be to check if a port is listening (for example, my portping tool tests TCP). This checks more functionality: that an application has listened on a port and is still present (ie, the process hasn't terminated - if it had, the kernel would close the port). While it means the application hasn't died, it could be completely frozen, or the kernel could be inundated with work and be unable to schedule it (why TCP has a backlog). If by connecting to the TCP port the remote application sends a protocol string (eg, SSH port 22 replys with something like "SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu4"), then you know a lot more: the kernel was able to complete the accept() syscall, schedule the application process, and the application process was able to send an initial response.

Testing further would involve simulating a client transaction and checking the response - which can check that the application is not just running, but also behaving normally. If it was a web server, a script could fetch a particular website and check it's MD5 and response time - tools like wget may assist. If a database server is to be tested, a database fetch could be attempted and the data checked - using Perl and CPAN libraries can help here. Using such specific tests is somtimes called "focused monitoring".

Now, if we have gone to the effort to check a particular detail by writing our own script, it is helpful to have this script update a log whenever the check is performed. If a user were to say "the service was slow at 9am this morning", it is nice to have a log where we have recorded service response times from our end-user simulation script.

Presentation

This how we draw attention to problems. Some things to consider:

web sites
colour coding
email alerts
pager alerts
audio alerts

Colour coding is very effective: red=bad, green=good. This is sometimes called "traffic lights", and will allow any staff member to understand your reports. However, be careful about false positives and negatives: objective metrics suit color coding (eg, hardware or failures), whereas subjective metrics may not (eg, performance).

Audio can be either keyboard beeps or recorded samples played through an audio card. Email or pager alerts should only be attempted if the code is "stateful" - a message is sent only when a change happens.

Operating Systems

The examples on this website are written for Unix or Linux, as they are ideal platforms to run monitoring tools from. This is because it is common for an install to have powerful scripting tools such as sh, ksh, bash, sed, awk; a powerful language with network libraries such as perl or python; tools to send emails such as sendmail, mail, mailx; and a webserver that may be needed to host the reports.

Other platforms such as Microsoft Windows could be used, however it may require installing extra software such as a perl distribution.

Screenshots and Downloads

Strategy 1 - CLI

pinghosts is a simple command line program to ping hosts in /etc/hosts and colour the output. Variants could be written to ping an /etc/prodhosts file, or /etc/devhosts - whatever is suitable. Using files like this makes maintenance easy.

$ pinghosts
Checking 127.0.0.1: 127.0.0.1 is alive
Checking 192.168.1.1: 192.168.1.1 is alive
Checking 192.168.1.2: no answer from 192.168.1.2
Checking 192.168.1.5: 192.168.1.5 is alive
Checking 192.168.1.150: 192.168.1.150 is alive
Checking 192.168.1.151: no answer from 192.168.1.151

Strategy 2 - Website

getping.sh this pings servers and produces a colour coded html website which is served by a webserver. This could be scheduled to run via crontab to update every 5 minutes. Example website:

Ping began at Saturday January 25 22:33:20 EST 2003

venus,
venus is alive
earth,
earth is alive
mars,
mars is alive
phobos,
no answer from phobos
192.168.1.1,
192.168.1.1 is alive

Completed at Saturday January 25 22:33:26 EST 2003

Strategy 3 - CGI

getping.cgi this pings servers and produces a colour coded html report. As a CGI it is triggered from a browser "on demand" to produce live data. Example website:

Ping began at Saturday January 25 22:34:30 EST 2003

venus,
venus is alive
earth,
earth is alive
mars,
no answer from mars
phobos,
phobos is alive
192.168.1.1,
192.168.1.1 is alive

Completed at Saturday January 25 22:34:36 EST 2003

Strategy 4 - Multi (Email, Syslog, CLI and Web)

watchping is a watchdog program to ping servers and take action if they go down. It is designed to run as a daemon so that it can be stateful - eg, only email when something changes not every time a check is made. The four actions are to send email alerts, send syslog alerts, log everything, and generate a website. By default it uses email and syslog.

Example website:

WatchPing Report, Friday January 3 03:07:13 EST 2003

mars is alive
no answer from phobos
localhost is alive
127.0.0.1 is alive
no answer from 192.168.1.5

Example running 1:

Here, watchping is run in verbose mode for the hosts mars and phobos. Phobos is down, so it sends a syslog message, emails root a message, and prints messages to the screen (verbose):

# ./watchping -v mars phobos
Running WatchPing...
Sleep Interval: 60 secs
Email address: root
Syslog priority: user.err
Checking Hosts: mars phobos
-----
Friday January 3 02:34:37 EST 2003
mars is alive
no answer from phobos
-----
Friday January 3 02:35:42 EST 2003
mars is alive
no answer from phobos


# tail -1 /var/adm/messages
Jan  3 02:34:42 mars watchping: [ID 702911 user.alert] Hosts Down: phobos   


# mail
From root@mars.dev.com Fri Jan 3 02:35:47 2003
Date: Fri, 3 Jan 2003 02:35:47 +1100 (EST)
From: Root 
Message-Id: <200301231535.h0NFzLVF029404@mars.dev.com>
Subject: WatchPing Alert: phobos
Content-Length: 159

WatchPing Alert

The following hosts failed to ping,
 phobos

Output from all pings,

Friday January 3 02:35:42 EST 2003
mars is alive
no answer from phobos
     
     
?

Example running 2:

Watchping can be used as a background process on startup, configured to check against custom lists of hosts, email different address, etc. This example demonstrates a combination of actions:

watchping -e sysadmin@mars -w /var/http/prod.html -i /etc/prod.txt &   
watchping -e dbadmin@venus -w /var/http/db.html -i /etc/db.txt &

If I've passed on some ideas for monitoring then the goal of this website has been a success. Good luck!

Back to Brendan Gregg's Homepage

Created: 25-Jun-2003
Last updated: 30-Dec-2004
Copyright (c) 2003, 2004 Brendan Gregg