Tuesday, January 15, 2013

Monitoring app coming along

When I replaced my UPS with a bigger stand alone battery backup system I realized I needed a way to monitor hydro fail state, as well as the voltage, so if it goes too low I start to shut stuff down properly remotely.

Then I decided instead of hard coding everything, to make a monitoring app, as I also wanted the ability to monitor aspects such as disk space, ram, cpu, load average, if a process is still running etc...

EnviroState was born.  It is kinda like Pandora, Nagios, etc but it's goal is to be sweet and simple, and easy to deploy.  It is a client-server setup using agents.  The agents connect to the server, get a list of stuff to monitor, then poll at their given intervals and report back to the server with the data.  The server then determines if values are within threshold or not and fires an alarm.  I recently finished the alarm display page.  The system works on the bassis of minor, major and critical alarms, so you set the low/high thresholds for each (or can ommit and just use major, or minor, or crit or any of the 3)  All configuration is done at the server by editing the config file.

Currently it's only been tested in Linux, but I will be eventually testing it in Windows, and first of all, make sure it even compiles. :P  Each monitor point is simply the output of a shell command.   So using sed, grep etc you can pretty much grab any variable data you want.

For hydro fail,  voltage and other physical monitor points, I use an Arduino circuit board with appropriate sensors.  A Python script connects to the serial port (was easier to do in Python than C++) and outputs all the alarm points to a text file.    EnviroState simply parses the values out of the text file.  So with this rather simple concept pretty much anything can be monitored. 

Here is a sample email, and the alarm list:

I just pulled the plug on my server rack, running on batteries now.   

I still need to add more features but it's coming along.  One feature I want to add is the ability for an alarm point to issue a command when it reaches a certain state.  So when battery voltage reaches critical, for example, it should issue a command that shuts down all the servers.

I also plan to add a hydrogen sensor and monitor that too.   Just for piece of mind.  If the charger goes bananas it could potentially cause the batteries to release hydrogen.

I plan to release a beta of this probably in the next few months or so.   It will be released as open source.

No comments: