Table Of Contents

Previous topic

Introduction

Next topic

Monitoring Configuration

Agent Architecture

Overview

General Architecture depicts the Abilisoft Agent structure.

_images/architecture.png

General Architecture

The Abilisoft Agent executes as a single process although from time to time it may spawn sub-processes to perform certain tasks. It operates on a proprietary store where configuration data, acquired data and processed data is kept. All the software (third party and otherwise) required by the Agent is contained in the installed directory, there are no other software dependencies. The agent is usually installed and run using an init.d script so that it is always restarted in the event of server reboots.

Runtime Configuration

Runtime configuration settings (that coerce the general behaviour of the Agent as opposed to defining what and how the Agent monitors the server) all have “built-in” defaults. These can be overridden in a number of ways:

  • By specifying new settings in the $AS_HOME/etc/asagent.conf file. (On Windows the registry is the default settings location but conf files can be used optionally).
  • By setting the relevant environment variable.
  • By passing in the relevant command line argument.

Settings are adhered to in priority order (1 being highest priority):

  1. Setting specified by command line argument
  2. Setting specified in an environment variable
  3. Setting specified in the asagent.conf file
  4. Built-in default setting

For example, an environment variable setting will override the built-in default but not an alternative setting value passed in as a command line argument. This provides a versatile and flexible configuration mechanism that suits most environments.

The settings file has an inclusion mechanism, allowing asagent.conf to be defined in terms of other configuration files.

Alternatively, settings can be modified via the AAPI remotely and new settings are (optionally) persisted to the Agent’s settings file. So for example, it is possible to update the manifest_uri setting in a deployed and executing agent, and then (via the AAPI) invoke a manifest reload.

On UNIX platforms all settings are described in the installed man pages. All setting names follow an intuitive format:

manifest_uri Configuration file setting name
--manifest_uri Command line parameter
AS_MANIFEST_URI Environment variable

On Windows the runtime configuration is defined in the registry (by default). See Runtime Configuration for details on the available settings for both UNIX and Windows platforms.

Agent Logging

The Agent utilises a logging mechanism whereby runtime configuration settings define in advance how much file-system space should be allocated to the log files (i.e. max log file size, number of rolled log files etc). There is also a wide range of log levels.

Agent Components

The internal components of the Agent are described below.

Agent Store

The Agent Store provides persistent storage for data acquired by the monitoring engine. The store maintains a window of sample data for each sample type. The window size is configurable in the Agent runtime configuration settings as a maximum sample age. When the monitor window is full it will wrap, overwriting the oldest sample data. Sample data in the store can be viewed via the AAPI (see AAPI).

After an Agent restart, the most recent sample data for all monitors is retrieved from the store. The date of the samples is considered and depending on the runtime setting analysis_prime_maxage, sample values are used to prime the analysis engine pipelines ensuring that any changes that took place during agent down-time are (within reason) considered.

Monitoring

The monitoring engine gathers data about the host and applications running on it. It performs monitoring by maintaining a set of monitors. Each monitor type has various degrees of configurability, but all can be enabled/disabled, have their periodicity customised and have observations placed on them in order for notifications to be created when certain conditions are met.

Sample results gathered by a monitor can have multiple facets; attribute values that constitute the values obtained during the sample effort (facets available may be platform dependant). These values can be selected for analysis in the monitoring configuration. For example, a disk usage sample result will have the facets usedBytes, freeBytes, totalBytes, percentUsed and percentFree. A threshold can be set on any of these values providing the flexibility for example, to set an upper threshold limit on percentUsed (say 95%) or a lower threshold limit on freeBytes (say 500Mib).

The monitoring engine will (with zero-configuration) collect essential host health data and store it in the Agent Store. A wide range of host health samples can be taken; however some are dependant on the platform the Agent is installed on. The host health samples fall into four main categories: CPU, DISK, MEMORY and MONITOR. Samples include:

  • CPU:
    • Overall CPU Usage (total, system, user, idle)
    • Per CPU Usage (total, system, user, idle)
    • Load Average (1, 5, 15 minutes)
  • DISK:
    • Disk Usage (used/free percent, total/used/free bytes)
  • MEMORY
    • Memory Usage (used/free percent, total/used/free bytes)
    • Swap Usage (used/free percent, total/used/free bytes)
    • Buffer Cache (bytes)
  • MONITOR
    • Up Time (seconds, delta time stamp)
    • Log Monitor. Must be instanced, for example can be used to check syslog, messages and auth log files for specific occurrences.
    • File Check. Must be instanced, for example can be used to check passwd/shadow database for changes.

Analysis

The analysis engine processes sample data in order to determine if an observation should be raised. A monitor can have none or more observations defined for it. In the monitoring configuration an observation specifies the test to perform and the parameters for that test. Test parameters can be literal values, facet values or functions on facet values (for example the last() function will return the value for a facet from the last sampling period). If a test evaluates to true then the observation is fired. Should an observation fire then it is passed to the notification engine and also written to the agent store. Refer to Tests for a full description of all available test types.

Notification

The notification engine processes observations created by the analysis engine. If a sample has any observations defined for it then it is possible to define one or more associated actions.

Example action types are: * Smtp: Sends an SMTP mail message to one or more recipients. * Trap: SNMP Trap – Raises a trap to an SNMP manager. * Command: Invokes a command line process. * Control: Starts or stops selected monitors.

Actions are typically used to send a message to bring any observations to the attention of an Operator. However, they can also be employed as an integration tool (using the Command action) to perform interactions with another system or to implement a corrective action. For example a common practice is to run a custom disk clean-up script in response to a disk usage threshold breach. Additionally actions can be used to adaptively control the monitoring behaviour of the agent.

AAPI

The Agent’s Application Programmer Interface (AAPI) provides a mechanism to interact with the agent securely and with authority. The AAPI is architected to be exposed via various mechanisms but currently only XML-RPC is supported. The AAPI methods enable the following to be accomplished programmatically:

  • Retrieval of monitoring related data.
  • Retrieval of the agent’s status including a history of its CPU and memory usage.
  • View and update the agent’s runtime configuration.
  • View the agent’s monitoring configuration.
  • Prompt the agent to check for the availability of new monitoring configuration.
  • Control of the agent’s log levels at runtime.
  • Agent status. The status of the agent including its CPU and memory utilisation.

Refer to the XML RPC API section which contains a full description of the AAPI methods.