ERM

Extensible Resource Manager

What's this ?

ERM is a general-purpose network management engine written in Java. I've created it because I found other open-source tools like nagios or opennms awkward at best. Plus, I find it easier to write monitoring extensions for java tools (weblogic, jboss, tomcat, standalone apps ...) in java.

Requirements

Anything that can run a java virtual machine 1.4+.

Installing

Unzip the downloaded file. What you get should look like this:

        lib/
        dist/
        conf/
        erm.sh
        erm.cmd
        lcp.cmd

These are the required JAR files, the XML config files and some helper scripts to run it on Unix and Windows. Basically, the installation is now done. Under Unix, you also have to chmod +x erm.sh before you can run it.

How to use it

This is quite simple. In the conf directory there are 3 XML files containing all the required configuration data. You should edit them according to your needs then run EMR and forget about it.

monitoring.xml: a list of probes you want to launch on your network and their execution interval.

                                 class of the probe
        <probe interval="120">net.sf.erm.task.weblogic.WLMemoryMonitoringProbe</probe>
                         ^^^ seconds count between two executions of this probe

Probes are executed on all devices in the network that have support for them.

netmap.xml: the full list of all your network elements and the list of tasks they support on which port.

        <device name="192.168.200.253">
    		<module port="161">
    			<supported>net.sf.erm.task.Mib2Task</supported>
    			<supported>net.sf.erm.task.OldCiscoConfigCopyTask</supported>
    			<supported>net.sf.erm.task.OldCiscoImageCopyTask</supported>
    		</module>
        </device>

config.xml: everything that could not fit in the 2 previous files. This includes SNMP communities, TFTP server root directory, data collector in use and more.

How is it working ? What can I put in the configuration files ?

The work of ERM is split into 3 layers: tasks, probes and data collector.

Tasks represents a single logical operation that can be performed on the device. Either collecting data or acting on it. Example of tasks are the OldCiscoConfigCopyTask which orders a cisco router via SNMP to upload its running config on the embedded TFTP server and the WLMemoryMonitoringTask which fetches from a weblogic server the maximum VM heap size and the current free heap memory. Tasks just know HOW to perform their action.

Data collector is where the collected information is sent. It is responsible for collecting data to some repository. Repository can be an RDBMS, filesystem or more simply the stdout stream of the process or even an email recipient. Data collector knows WHAT to do with the collected data. Detected outages are also "collected" by the data collector.

Probes are scheduled actions. At specified interval, the just run on every single device that is able to support them and send eventually collected informations to the data collector. Probes have dependencies on one or more tasks to be able to run. A device that supports all the tasks required by a probe will have that probe automatically run on it. Probes know WHEN to perform operations

ERM is just a collection of services (like SNMP engine or TFTP server) that are able run tasks and send their result wherever is needed. Right now, included tasks are:

net.sf.erm.task.Mib2Task: collects sysname and sysdescr via SNMP
net.sf.erm.task.OldCiscoConfigCopyTask: orders a cisco device to upload its running config via TFTP. Should work on any cisco device but is quite buggy since it uses the OLD-CISCO-SYS mib. Not recommened for prodcution use.
net.sf.erm.task.OldCiscoImageCopyTask: orders a cisco device to upload its IOS image via TFTP. Same remaks as for the image copying task.
net.sf.erm.task.weblogic.WLMemoryTask: collects VM heap memory size via SNMP
net.sf.erm.task.weblogic.WLExecuteQueueTask: collects diverse weblogic queue stats via SNMP
net.sf.erm.task.weblogic.WLServletTask: collects runtime info about servlets and JSPs via SNMP
net.sf.erm.task.BasicTcpConnectivityTask: checks that a TCP port is open
net.sf.erm.task.Jdk5MemoryTask: monitors JVM memory using the JDK 5 embedded SNMP agent. See here for more details

Included probes are:

net.sf.erm.probe.weblogic.WLMemoryMonitoringProbe: simply runs the WLMemoryTask
net.sf.erm.probe.weblogic.WLExecuteQueueMonitoringProbe: simply runs the WLExecuteQueueTask
net.sf.erm.probe.weblogic.WLServletMonitoringProbe: simply runs the WLServletTask
net.sf.erm.probe.TcpConnectivityMonitoringProbe: simply runs the BasicTcpConnectivityTask
net.sf.erm.probe.Jdk5MemoryMonitoringProbe: simply runs the Jdk5MemoryTask

and available collectors are:

net.sf.erm.service.monitoring.ConsoleDataCollector: outputs collected data and outages to stdout
net.sf.erm.service.monitoring.JDBCDataCollector: inserts collected data and outages into a JDBC database. Database connection details must be set in config.xml.

Schema for the JDBCDataCollector is:

CREATE TABLE "OUTAGES" 
(
  "DEVICE"	 VARCHAR(80) NOT NULL,
  "CATEGORY"	 VARCHAR(80) NOT NULL,
  "STAT_NAME"	 VARCHAR(80) NOT NULL,
  "SUBJECT"	 VARCHAR(250) NOT NULL,
  "REASON"	 VARCHAR(4000),
  "COLLECT_DATE"	 TIMESTAMP NOT NULL
);

CREATE TABLE "STATS" 
(
  "DEVICE"	 VARCHAR(80) NOT NULL,
  "CATEGORY"	 VARCHAR(80) NOT NULL,
  "STAT_NAME"	 VARCHAR(80) NOT NULL,
  "SUBJECT"	 VARCHAR(250) NOT NULL,
  "VALUE"	 INTEGER,
  "COLLECT_DATE"	 TIMESTAMP NOT NULL
);

What if I want to write tasks/probes/data collectors ?

Be welcome ! You should simply start by having a look at other classes already written. These extensions should be as easy to write as possible so if you encounter problems, please let me know.