Tuesday, January 15, 2013

Monitoring app coming along

When I replaced my UPS with a bigger stand alone battery backup system I realized I needed a way to monitor hydro fail state, as well as the voltage, so if it goes too low I start to shut stuff down properly remotely.

Then I decided instead of hard coding everything, to make a monitoring app, as I also wanted the ability to monitor aspects such as disk space, ram, cpu, load average, if a process is still running etc...

EnviroState was born.  It is kinda like Pandora, Nagios, etc but it's goal is to be sweet and simple, and easy to deploy.  It is a client-server setup using agents.  The agents connect to the server, get a list of stuff to monitor, then poll at their given intervals and report back to the server with the data.  The server then determines if values are within threshold or not and fires an alarm.  I recently finished the alarm display page.  The system works on the bassis of minor, major and critical alarms, so you set the low/high thresholds for each (or can ommit and just use major, or minor, or crit or any of the 3)  All configuration is done at the server by editing the config file.

Currently it's only been tested in Linux, but I will be eventually testing it in Windows, and first of all, make sure it even compiles. :P  Each monitor point is simply the output of a shell command.   So using sed, grep etc you can pretty much grab any variable data you want.

For hydro fail,  voltage and other physical monitor points, I use an Arduino circuit board with appropriate sensors.  A Python script connects to the serial port (was easier to do in Python than C++) and outputs all the alarm points to a text file.    EnviroState simply parses the values out of the text file.  So with this rather simple concept pretty much anything can be monitored. 

Here is a sample email, and the alarm list:



I just pulled the plug on my server rack, running on batteries now.   

I still need to add more features but it's coming along.  One feature I want to add is the ability for an alarm point to issue a command when it reaches a certain state.  So when battery voltage reaches critical, for example, it should issue a command that shuts down all the servers.

I also plan to add a hydrogen sensor and monitor that too.   Just for piece of mind.  If the charger goes bananas it could potentially cause the batteries to release hydrogen.

I plan to release a beta of this probably in the next few months or so.   It will be released as open source.

Thursday, January 3, 2013

Now to nervously wait as this raid does a live rebuild

Added 2 drives to my raid 5 array.  going from 6 1TB drives to 8 1TB drives.    Rebuilding now.    This many drives in raid 5 make me a little nervous.  My kernel is too old to grow to raid 6.  I'm sure I'll be fine, but still nerve wracking, especially with my luck.


[root@borg temp]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Sep 20 02:15:28 2008
     Raid Level : raid5
     Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jan  3 21:13:05 2013
          State : clean
 Active Devices : 6
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 2

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 11f961e7:0e37ba39:2c8a1552:76dd72ee
         Events : 0.1440052

    Number   Major   Minor   RaidDevice State
       0       8       96        0      active sync   /dev/sdg
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8      112        4      active sync   /dev/sdh
       5       8       80        5      active sync   /dev/sdf

       6       8      128        -      spare   /dev/sdi
       7       8      160        -      spare   /dev/sdk
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# df -hl
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             433G   12G  400G   3% /
/dev/sda1             190M   25M  156M  14% /boot
tmpfs                 3.8G   48K  3.8G   1% /dev/shm
/dev/md0              4.5T  3.3T  995G  78% /raid1
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# cat simplebenchmark.sh 
echo
dd if=/dev/zero of=test.bin bs=5000000 count=1000
echo
echo
dd of=/dev/null if=test.bin bs=5000000
echo

[root@borg temp]# ./simplebenchmark.sh 

1000+0 records in
1000+0 records out
5000000000 bytes (5.0 GB) copied, 24.2193 s, 206 MB/s


1000+0 records in
1000+0 records out
5000000000 bytes (5.0 GB) copied, 3.03028 s, 1.7 GB/s

[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# mdadm --grow /dev/md0 --raid-devices=8
mdadm: Need to backup 1344K of critical section..
mdadm: ... critical section passed.
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# 
[root@borg temp]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.91
  Creation Time : Sat Sep 20 02:15:28 2008
     Raid Level : raid5
     Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jan  3 21:16:08 2013
          State : clean, recovering
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Reshape Status : 0% complete
  Delta Devices : 2, (6->8)

           UUID : 11f961e7:0e37ba39:2c8a1552:76dd72ee
         Events : 0.1440210

    Number   Major   Minor   RaidDevice State
       0       8       96        0      active sync   /dev/sdg
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8      112        4      active sync   /dev/sdh
       5       8       80        5      active sync   /dev/sdf
       6       8      128        6      active sync   /dev/sdi
       7       8      160        7      active sync   /dev/sdk
[root@borg temp]# 


Will be interesting to see the IO speed after this.  I wonder if I've already reached a plateau, or if I can actually get more out of it.  I'm mostly doing this for disk space and not performance mind you, but always nice to get some extra performance too. 

There's also a hot spare that's not added but when I was doing testing it had lot of smart errors so think I'll have to RMA it.    Using all WD Blacks 1TB. I missed out on the cheap 3TB drives pre flood, but at slightly under 100 bucks these 1TB were hard to resist. NCIX had a sale.  All my slots are full now in my server.    This will give me about 1.7TB of extra space. By the time I remotely come close to using all that I'll probably have built a SAN anyway.  I rarely delete data, I just add more disk space. :D

Tuesday, January 1, 2013

Merry Christmas, and Happy New year!

Have not updated this in a while, so thought I'd wish everyone a very Merry Christmas, and Happy New Year!

Christmas is now officially over, but remember, the everlasting gift of God is never over. Jesus Christ was sent to walk the Earth, and to take our place on the cross. As sinners we were all destined for hell. All we have to do is accept we are sinners, have faith and accept Jesus as our Savior.

A quick update on stuff I've been up to, first off, in the networking side of things, I bought myself a managed gigabit switch off Ebay as well as wireless access points. Only using one at the moment given it covers the whole house and then some.





It's nice to be able to do vlans and have enough ports on a single switch without daisy chaining a bunch of small ones. Eventually I also want to setup the wifi controller software so I can do more advanced stuff with the AP. Currently it works stand alone. I also want to vlan it so it can have a few different SSID networks.

Secondly, I also bought myself a Wii U. This is my first console since the original Playstation! Figured it would be something different to play with. I had been wanting to buy a Wii and figured I'd just wait for this to come out, so I preordered and got it in November. Early Christmas gift!



Lastly, I recently bought myself a 3rd monitor for my workstation. Unfortunately Linux seems to have really poor support for more than 2 monitors so it does not work quite right and can be flaky at times. The 3rd monitor has it's own X session so I actually can't drag stuff over. But for what I tend to use it for, it's not too bad. I usually leave a VNC session to my main server up, and then I place my consoles for debugging or what not, when coding.



I really need to finish that custom desk once summer arrives, so I can have more room!

Given I gave up on my UO game server, I currently don't really have a major project on the go, but have been working on and off (between Minecraft and Wii!) on an environmental/server monitoring system. What started off as a very basic tool to monitor home temperatures, turned into a more advanced fully configurable agent based monitoring system. I have the email notifications working nicely and while I still need to iron out some bugs such as a random segfault on exit, things are coming together and I have been using it in production.

The idea behind my system is that it's up to the admin to create monitors, and monitors are simply the output of a command. This simplifies a lot of things as you can simply setup monitors for pretty much anything you can imagine, as long as you can get the numerical output out of it. In Linux this is fairly easy given the powerful tools like sed and grep. Add Arduino to the mix and you can monitor pretty much anything physical too. Currently at home I can monitor Hydro power state, backup battery voltage, temperatures and also have a bunch of basic server related monitors setup. More to be added later such as hard drive temperature or error rate. There is really no limit to what can be monitored with this system. I've been wanting to setup water sensors as well as more smoke detectors and hook them up to this system as well.

Heck, I can even monitor mouse traps. No, really.



3 sets of 2 wired in a normally closed configuration.





I had a few mice in my attic and rather than having to move all the stuff out of the closet to go check the traps all the time, I just have to check them if I get an email alert. Thankfully I seem to have found where they were coming in from, but I'll know for sure in summer.

This app will probably eventually be released to public, once it's polished better. I will probably also add an "action" feature to it, where a certain alert can trigger a script. For example, low battery voltage critical could trigger a shut down of all equipment.

Once this project is done, I've been toying with the idea of making a full blown server management system, something like Cpanel but free. It may be easier to make it into my own Linux distro, but not sure yet. Simply put it will be a web based point and click system to setup multiple web hosting/file sharing servers. You simply enable the modules you want to use and can have either a full blown web hosting environment, or perhaps just a file server for a local network.

It will do all the dirty work for you, such as configuring virtual hosts and various fail over features of services such as DNS. It will more or less be a wrapper around existing software.

Ii've also been toying with the idea of making a MMO, but that's probably far fetched. I'll finish the easier stuff first. I also need to look into updating my websites. Some have not been touched in nearly a decade! May consolidate some of the forums I manage, as well. Should be a fun 2013!