Table Of Contents

Previous topic

Installation

Next topic

Operation

Implementation

Agent Monitoring Configuration

As discussed in the overview the Agent’s monitoring behaviour is defined in the Manifest. This is an XML document (or set of documents) with a simple structure. The vocabulary is intuitive and easy to remember. Mechanisms are available to override default settings with as little syntax as possible. This section first describes some general concepts and then goes on to describe each element type and how those element types fit together. Not all elements and attributes are described here, a full DTD is provided in $AS_HOME/dtd/AsManifestMaster.dtd when MA is installed.

XML Syntax Primer

Here a quick refresher on what valid XML looks like in general. Please note that this is not a discussion about the MA configuration language, just XML in general. First here is some example XML which contains all major elements:

<root>
  <!-- This is a comment -->
  <child id="first" attr="val">data</child>
  <child id="second">
    <subchild>data</subchild>
  </child>
  <child id="empty"/>
  <child id="escapes">
     These are in the same order as the ones in the CDATA
     section below: &lt;, &gt;, &amp;, &quot;
  </child>
  <child id="cdata"><![CDATA[
     Unparsed data, can include <, >, &, " but not]]> ]]>
  </child>
  <!--
      This is also a comment
  -->
</root>

Basically XML consists of tags or elements (these two terms are often used interchangeably) which have attributes and data in them. Let’s go through a whirlwind tour of this example.

<root>

This is the fist tag in this document, called the root element, with an element name of “root”. XML specifies that there must be one and exactly exactly one root element, the name of which doesn’t matter.

<!-- Comment -->

Comments may appear anywhere in a document outside other markup. They are not part of the XML document’s character data but an XML processor may make it possible for an application to retrieve the text of comments. The string -- (double-hyphen) must not occur within comments.

<child id="first">

The first child tag, this element has an attribute called id which is set to the value “first”, a second attribute attr has the value val. Inside it is “data” which can be any text, XML itself doesn’t care as long as it does not contain any special characters.

Note that XML actually treats the id attribute special, it’s value should be a unique from all other id values in the document. XML also reserves any attribute starting with xml: as special, so avoid using those.

<child id="second>

This is another child, it shows that any element can have nested children, as many as you like.

<child id="empty"/>

An element with no data can also be written like this. This is semantically the same as:

<child id="empty"></child>
<child id="escapes">

As you may have noticed already XML treats some characters as special, they are: < > & ". If you want to use these characters you must escape them with the sequences shown. Notice that " can be used freely in element data, the escaped version is only needed when it must be used as an attribute value.

<child id="cdata">

When storing data inside an element it can be cumbersome if it contains lots of characters that must be escaped. In this case you can use Unparsed Character Data instead of the default Parsed Character Data. This is done by enclosing the data between <![CDATA[ and ]]>.

To clarify, the actual data contained in this element (stripped from leading and trailing whitespace which is actually significant in XML) is:

Unparsed data, can include <, >, &, " but not ]]>

This introduction to XML is adequate in order to configure the agent by writing the manifest.xml file. If you would like to know more about XML the World Wide Web Consortium publishes the standard at http://www.w3.org/TR/xml11/.

Choosing SNMP OIDs

It is possible to configure the agent to send traps via SNMP. When configuring a trap you need to decide on the trap OID as well as the OIDs of the varbinds sent with the trap. It is essentially up to the user to chose whatever they want and manage any possible conflicts, e.g. if the network is entirely isolated and no other SNMP software is used no conflict can ever happen and any OID will do.

However, as a general rule it is best to use a subtree in an IANA assigned private enterprise number. If the owner of the network where the agent is deployed has an enterprise number you are strongly advised to use a subtree from it. You should have an internal contact for this but if you don’t know who it is IANA also keeps a contact list for you to search.

If you do not have access to your (or your company’s) enterprise number you may use a subtree underneath the Abilisoft enterprise number. We have reserved the subtree 1.3.6.1.4.1.26788.200. While Abilisoft will never use an OID under this subtree, it must be noted that it could also be in use by other Abilisoft customers. Finally, please avoid using any Abilisoft owned OIDs outside of the 1.3.6.1.4.1.26788.200 subtree as we can not guarantee that the number will never conflict.

Manifest Values

Some manifest data is required a specific format:

Format Description
type String that is a supported Component, Monitor or Test type. These are described in Monitor Types.
string Any alpha-numeric string, special characters should be avoided or escaped.
boolean A string that can be any usual true/false representation in upper or lower case: true/false; t/f; on/off; 1/0; yes/no; y/n.
int A string that represents an integer value (e.g. “1”, “2”).
float A string that represents a floating point value (e.g. “1.5” or “1.4567e-2”).
period A string that specifies a delta time in days, hours minutes an seconds. See section Periodicity for more details.
systemTime A string that describes a system time in the following format: YYYY/MM/DD HH:MM:SS.
rawTime A string that describes a system time in the following format: YYYYMMDDHHMMSS.
regex A regular expression, the syntax of which is described Regular Expression Syntax.
filepath A valid file or directory path. Environment variables are acceptable.
uri A valid URI (Uniform Resource Identifier).

Periodicity

Specifying a periodicity in the manifest is in terms of days, hours minutes and seconds using the longhand format days:hours:minutes:seconds. For example:

1:12:23:22 = 1 Day, 12 Hours, 23 minutes and 22 Seconds

A shorthand format can be used, here are some examples:

60            one minute
:::60         one minute
::1:          one minute

::60:         one hour
:1::          one hour

1:0:0:0       one day
1:::          one day
:24::         one day

::1:30        one minute thirty seconds
1::15:30      87,330 seconds

Action Containers

The following manifest elements can be “Action Containers”:

<Manifest>
<Fragment>
<Component>
<Observation>

That is, one can define actions (escalations like Trap and Smtp) within these elements. The action defined is in scope in all sub-elements. An action from any upper level can be overridden partially or entirely in a sub level, independent of peer levels and their overriding behaviour. For example:

<Trap name="defaultTrap">
  <Parameter index="1" name="destination">manager1-abilisoft.com</Parameter>
  <Parameter index="2" name="community">public</Parameter>
  <Parameter index="3" name="version">2c</Parameter>
  <Parameter index="4" name="oid">enterprises.26788.100</Parameter>
  <Parameter index="5" name="object">1;s;MINOR</Parameter>
  <Parameter index="8" name="object">4;s;%{message}</Parameter>
</Trap>
<Component name="SSHDaemon" type="GENERIC_APPLICATION">
  <Monitor name="processMonitor">
    <Parameter>
      <Observation name="processUpObservation">
        ...
        <Trap name="defaultTrap">
          <Parameter name="object">1;s;CLEAR</Parameter>
        </Trap>
      </Observation>
      <Observation name="processDownObservation">
        <Trap name="defaultTrap">
          <Parameter name="object">1;s;CRITICAL</Parameter>
        </Trap>
      </Observation>
  </Sample>
</Component>

Actions are described fully in Actions.

Action Retry Behaviour

All actions can be retried if they fail. This is globally controlled by the notify_retrycount and notify_retrydelay settings and can also be changed for each action separately using the retrycount and retrydelay attributes on action definitions.

The count defines how many times an action should be retried if it fails. Setting retrycount to 0 means a retry will only be attempted once, while setting it to 1 will make the action be tried twice etc. The special value of -1 can be used to force retry attempts forever, but care must be taken when using this as it might cause the agent to build up a huge number of actions to retry resulting in the agent notification virtually stalling while failed actions are retried.

The delay defines how long the notification subsystem should wait before attempting an action retry. This is a period setting so the usual notation of using colons to specify days hours or minutes can be used, or a simple integer expressing seconds. Depending on the type of action you may want to make this fairly short or very long.

An example of using the attributes to change the configured action retry settings for one specific action:

<Smtp name="defaultTrap" retrycount="5" retrydelay="300">
  ...
</Smtp>

Beware that whenever the agent gets reconfigured any actions which were scheduled for retrying will be lost and no longer retried. As soon as the first failed action has occurred the observation will be marked as processed in the OBSERVATION table and an entry will be added in the ACTIONX table with success set to false. On each action retry attempt the entry in the ACTIONX table will get its timestamp updated and if it succeeds success will also be set to true.

Placeholders

Warning

The recommended placeholder syntax is now {placeholder}. All other syntaxes are deprecated and should be phased out where possible.

Changed in version 7.0: The (now deprecated) %{placeholder} syntax was introduced.

Changed in version 7.2: The {placeholder} syntax was introduced.

Placeholders can be used in various parts of the manifest to refer to values from elsewhere like in observation messages or monitor parameters among others places. Placeholders are simply names enclosed in braces:

{placeholder}

Here placehoder must be a name of a value which is known in the context, all of which are described in the table below. It is also possible to use the placeholders with surrounding text, e.g. for an observation message you could use:

The disk {Parameter.device} on {hostname} is {Facet.percentUsed}% full.

If you would like to use a literal { or } character you should double it up as such:

This is {{text enclosed in braces }}

The following are all the placeholder names which can be used inside the braces:

Place Holder Description
hostname The name of the host the agent is running on.
fqdn The fully qualified domain name the agent is running on.
ip The IP address of the host the agent is running on. If the host has multiple interfaces, this is a comma separated value list.
dts Timestamp of an observation as YYYY/MM/DD HH:MM:SS.
dtsNice Timestamp of an observation in a human readable format.
epoch UNIX Timestamp of an observation, can only be used inside an observation’s <Message/> element.
Facet.<facet_name> The value of a sample facet from the sample used for the observation.
Facet.<facet_name>.type The type of a sample facet from the sample used for the observation.
Facet.* A comma separated list of name-values for each available facet in the sample used in the observation.
LastFacet.<name> The value of a facet from the last sample, i.e. the sample before the one which raised this observation.
LastFacet.<name>.type The type of a facet from the last sample.
LastFacet.* A comma separated list of name-value pairs for each facet from the last sample.
message The message defined for the observation, can only be used in an action parameter.
Parameter.<name> The parameter to a monitor, see Parameter placeholders below.
<Variable-name> Variable value, see Variable placeholders below

It is also possible to format how numbers are converted to strings by using a small formatting language inside braces:

{Facet.dfPercentUsage:.1f}

This would format the dfPercentUsage facet, which is a floating point number, to one decimal place.

The full placeholder syntax is:

{key[:[width][.precision][type]]}
key:The placeholder name.
width:Minimum number of characters the formatted value must occupy.
precision:The number of decimal places to print if the key refers to a floating point number.
type:Conversion character of how to display the value (type specifier).

The following type specifiers are valid. Depending on the original value (string, int, float) not all conversions are possible, e.g. it is not possible to display a string as a number.

  Strings
s string (default for string values)
  Integers
d decimal integer (default for integer values)
o octal
x hex, lower-case
X hex, upper-case
  Floats
e exponent notation, using “e”
E exponent notation, using “E”
f fixed point
g general format, switches between fixed point and exponent notation based on precision and magnitude (default for float values)
G same as g bug uses “E” for exponent notation

Note

There are several older placeholder syntaxes you might encounter. You should no longer use them as they are not capable of formatting their values and have other deficiencies. But in case you encounter them they are:

%(placeholder)s
%(placeholder)
%{placeholder}
%placeholder

Do not use them, they are deprecated and will be removed in future versions. Furthermore not each syntax is available in each context and these rules are confusing.

Parameter placeholders

Parameter placeholders can be used inside <Message/> tags. A placeholder named Parameter.<param-name> is dynamically created for each Sample Parameter and when used will be substituted with the parameter value. For example, if the FileMonitor sample parameter “path” is specified as such:

{Parameter.path}

it will be substituted with the parameter value (e.g. /etc/syslog.conf).

Note

Parameter placeholders cannot be used in an Action’s parameters.

Facet placeholders

All Facet and LastFacet placeholders can be used inside <Message/> tags. They allow you to access a facet’s type and value from the sample that gave rise to the observation. Additionally you can access the same for a previous sample using LastFacet.

It is also possible to use the %{Facet.<facet_name>}, %{Facet.<facet_name>.type} and %{Facet.*} placeholders inside action parameters.

Note

LastFacet style place-holders cannot be used in an Action’s parameters.

Variable placeholders

Variable parameters are described in section <Variable>. A variable is a component level tag that has a value that can be re-used in sample, observation and action parameters. A placeholder named <variable-name> is dynamically created for each Variable parameter and when used will be substituted with the Variable parameter value. For example, if a Component has a Variable parameter named “appName” and a placeholder is specified as such:

%{appName}

it will be substituted with the parameter value (e.g. “Exchange Server”).

Special Cases

  • Using placeholders in Sample Parameters: Only %{hostname}, %{ip} and %{<Variable-name>} can be used.
  • Using the message placeholder: %{message} can only be used in Action parameters.

Built-in Components

Even with no Manifest content, a set of built in components are defined that monitor the health of the host the Agent is running on. By default, these components and most of their monitors are enabled. The default behaviour can be modified in the Agent’s runtime configuration or more directly in the Manifest. Refer to Built-ins and Runtime Configuration for more details.

defaultTrap

Every built-in definition has an Observation that specifies a Trap action called defaultTrap. This means that by simply specifying a Trap action called defaultTrap in an Action container at a relevant point in the manifest (usually at the manifest level) will allow those observations to be escalated accordingly.

<Manifest>

Attribute Value Description
owner string Responsible owner of the manifest.
updated systemTime Time the manifest was updated. This timestamp is used by the agent to determine if a new manifest should be loaded.
effectiveFrom systemTime Time the manifest is effective from. This timestamp is used by the agent to determine when a new manifest should be loaded.
xmlns:xi uri Should always be “http://www.w3.org/2001/XInclude”. This is only required if <xi:include> tags are used (see <xi:include>).

The <Manifest> tag is the monitoring configuration entry-point. All other configuration is defined inside this tag.

<xi:include>

The xi:xinclude element provides a powerful way to structure MA monitoring configurations. Using the xi:xinclude directive, sub-manifests can be combined in various ways providing a monitoring configuration that can acquired from various locations. Sub-manifests can be included from local or shared file-systems or from HTTP services. Sub-manifests can themselves include other sub-manifests.

An XInclude is defined as follows:

<Manifest name="Abilisoft"
          owner="Abilisoft"
          updated="2008/10/08 05:29:16"
          effectiveFrom="2007/01/01 00:00:00"
          xmlns:xi="http://www.w3.org/2001/XInclude">

  <xi:include href="https://earth.abilisoft.com/xml/host_mon.xml"/>

</Manifest>

The href attribute can be any valid Uniform Resource Identifier (URI). File system includes should be prefixed with file://, for example:

<xi:include href="file:///opt/abilisoft.com/asagent/etc/sub_manifest.xml"/>

Care should be taken on Windows platforms, the leading / is still required. Furthermore, take care with spaces in file names and paths. For example:

<xi:include href="file:///C:/Program%20Files/Abilisoft.com/asagent/etc/sub_manifest.xml"/>

An include for a sub-manifest hosted on a web server would be defined like this:

<xi:include href="http://earth.abilisoft.com/ma/xml/defaultTrapDef.xml"/>

A sub-manifest generated by a RESTful service:

<xi:include href="http://earth.abilisoft.com/TrapDefs?id=Default"/>

A sub-manifest generated by a RESTful service utilising the !FQDN! macro:

<xi:include href="http://earth.abilisoft.com/GetManifest?server=!FQDN!"/>

Fragments

An XInclude can be defined almost anywhere but the included content must be valid in relation to the place it is included. For example:

<Component name="oracle_component" type="GENERIC_EMPTY">
  <xi:include href="https://earth.abilisoft.com/xml/oracle_monitors.xml"/>
</Component>

should include only a single monitor definition. If the included document has more than one root element (e.g. more than one <Monitor> is defined), you must encapsulate the included document content in a single root element. For this purpose a Fragment element is supported by the agent’s DTD, for example:

<Fragment xmlns:xi="http://www.w3.org/2001/XInclude">
  <Monitor name="dbMon1" type="agent.mon.dbMonitor">
  ...
  </Monitor>
  <Monitor name="dbMon2" type="agent.mon.dbMonitor">
  ...
  </Monitor>
</Fragment>

Note

When using the xi:xinclude element, the XML file must have the XInclude namespace xi declared by adding the xmlns:xi="http://www.w3.org/2001/XInclude" attribute to the top level element in the including XML file.

Fallbacks

The XInclude mechanism also provides the ability to cope with connectivity issues by allowing for alternate XML to be used if an include fails. One can specify an alternate XInclude, some XML or nothing. Here is a fallback example:

<xi:include href="https://earth.abilisoft.com/xml/host_mon.xml">
  <xi:fallback xmlns:xi="http://www.w3.org/2001/XInclude">
    <!-- Get host monitoring defs from an alternate server -->
    <xi:include href="http://mars.abilisoft.com/xml/host_mon.xml"/>
  </xi:fallback>
</xi:include>

Fallbacks can also be nested to many levels, catering for multiple inclusion failures:

<xi:include href="http://earth.abilisoft.com/xml/host_mon.xml">
  <xi:fallback xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="http://mars.abilisoft.com/xml/host_mon.xml">
      <xi:fallback xmlns:xi="http://www.w3.org/2001/XInclude">
        <!-- Host monitoring unavailable, here one could specify:
               * A heartbeat monitor that sends a trap indicating
                 the agent is not fully configured.
               * Basic monitoring XML.
               * An XInclude from the file system.
        -->
      </xi:fallback>
    </xi:include>
  </xi:fallback>
</xi:include>

Summary of inclusion do’s and don’ts

Only certain manifest elements can contain child elements that are <xi:include> elements (this is largely driven by the number of child elements a given tag has and if the DTD supports child <Fragment> elements:

  • The <Manifest> element can have child <xi:include> elements but make sure you specify the xmlns:xi attribute on the <Manifest> element if includes are used. Included content must be compatible <Manifest> content, i.e.:

    Variable|Trap|Smtp|Cmd|Notify|Component|xi:include|Fragment

  • If the included document has more than one root element (e.g. more than one <Variable> element), encapsulate in a <Fragment> element.

  • The <Component> element can have an immediate child <xi:include> element. Included content must be compatible with <Component> content, i.e.:

    Monitor|Default|Variable|Trap|Smtp|Cmd|Notify|xi:include|Fragment

  • If the included document has more than one root element (e.g. more than one <Monitor> element), encapsulate content in a <Fragment> element.

  • The <Fragment> element can also contain <xi:include> elements but with a caveat: Sub-documents cannot themselves contain a top level <Fragment> element so the content must be singular.

XInclude Example

This example shows how you can use <Variable> elements in a top level manifest to drive the monitoring behaviour of a configuration in an included manifest.

Consider that one used a web service to generate master manifest XML data. The web service could generate the following based on a parameter specified in the MasterManifestURI. If the calling agent’s MasterManifestURI parameter was dept=development, variable values generated in the manifest could specify database connection parameters for monitored databases in the development department:

<Manifest name="Abilisoft"
          owner="Abilisoft"
          updated="2008/10/08 05:29:16"
          effectiveFrom="2007/01/01 00:00:00"
          xmlns:xi="http://www.w3.org/2001/XInclude">

  <Variable name="dbuser">devuser</Variable>
  <Variable name="dbpwd">dev-secret</Variable>
  <Variable name="dbname">ora_dev</Variable>

  <xi:include href="https://earth.abilisoft.com/get_config?id=dbmonitoring"/>

</Manifest>

A parameter of dept=production would cause a different manifest with different variable values:

<Manifest name="Abilisoft"
          owner="Abilisoft"
          updated="2008/10/08 05:29:16"
          effectiveFrom="2007/01/01 00:00:00"
          xmlns:xi="http://www.w3.org/2001/XInclude">

  <Variable name="dbuser">produser</Variable>
  <Variable name="dbpwd">prod-secret</Variable>
  <Variable name="dbname">ora_prod</Variable>

  <xi:include href="https://earth.abilisoft.com/get_config?id=dbmonitoring"/>

</Manifest>

The included database monitoring fragment would be used in development and production and a database monitor within could be defined as follows:

<Component name="oramon" type="GENERIC_EMPTY">
  <Monitor name="oraMonitor1" type="agent.mon.dbMonitor" periodicity="30">
    <Parameter name="flavour">ORACLE</Parameter>
    <Parameter name="connStr">%{dbuser}/%{dbpwd}@%{dbname}</Parameter>
    <Parameter name="query">select * from EMPLOYEES</Parameter>
    <Parameter name="numRows">10</Parameter>
  </Monitor>
  <Monitor name="oraMonitor2" type="agent.mon.dbMonitor" periodicity="30">
   ...
  </Monitor>
</Component>

The important point to note is that only one database monitoring configuration would need to be specified in the sub-manifest, using the variable values specified in the top-level manifest to parameterise the database monitors within a sub-manifest. Therefore judicious use of the xi:xinclude mechanism can greatly minimise the amount of configuration that needs to be written.

<User>

Attribute Value Description
name string User’s name.
pwd string Password.
encrypted boolean If true then ‘pwd’ attribute is treated as a cipher. If this attribute is omitted false is implied.

The <User> tag is used to specify a user, their password and what features they can or cannot invoke on the AAPI and XML RPC API. This tag should encapsulate one or more <Feature> tags.

<Feature>

Attribute Value Description
name string Feature name, for example: ma.aapi.rpc_query.

The <Feature> tag is used to specify an allowed or disallowed feature for a user. The element content is #PCDATA that specifies a boolean value (i.e. true or false). Valid values for name (features) are:

  • ma.aapi.rpc_check - Invoke a check for new monitoring configuration.
  • ma.aapi.rpc_forcecheck - Invoke a check for new monitoring configuration (this call will cause the agent to load the monitoring configuration regardless of whether it is detected as updated or not).
  • ma.aapi.rpc_query - Query the agent store.
  • ma.aapi.rpc_manifest - Get current monitoring configuration.
  • ma.aapi.rpc_setmanifest - Inject monitoring configuration into the agent.
  • ma.aapi.rpc_setfragment - Save a monitoring configuration fragment using the name provided in the agent’s etc directory.
  • ma.aapi.rpc_getfragment - Get a previously ‘set’ monitoring configuration fragment.
  • ma.aapi.rpc_getcfg - Get a runtime configuration value.
  • ma.aapi.rpc_setcfg - Set a runtime configuration value.
  • ma.aapi.rpc_getcfgall - Get all runtime configuration values.
  • ma.aapi.rpc_savecfg - Save runtime configuration updated with setcfg calls. This ensures AAPI RTC changes are persisted between agent restarts.
  • ma.aapi.rpc_getloglevel - Get the level the agent is logging to its log file at.
  • ma.aapi.rpc_setloglevel - Set the level the agent should log to its log file at (this takes immediate effect but does not persist between agent restarts).
  • ma.aapi.rpc_status - Get a summary of the agent’s status.
  • ma.aapi.rpc_uptime - Get the agent uptime in seconds. Note this method is a minimum requirement for an MAQL initiated session and as such so should be enabled if the user is to connect using MAQL.

If a feature is not specified in a user definition false is implied.

<hostgroup>

The <hostgroup/> tag is used to specify large groups of hosts in one place. These groups can be used by most of the network monitors (agent.net.*). Here is an example:

<hostgroup name="foo">
  <host>1.2.3.4</host>
  <host>1.2.3.5</host>
  <range>
    <start>1.2.3.10</host>
    <end>1.2.3.25</host>
  </range>
  <subnet>
    <net>1.2.3.0/24</net>
    <exclude>
      <!-- net & broadcast addresses are excluded by default -->
      <host>1.2.3.10</host>
      <range>
        <start>1.2.3.4</start>
        <end>1.2.3.8</end>
      </range>
    </exclude>
    <include>
      <!-- Same syntax as for exclude -->
    </include>
  </subnet>
</hostgroup>

A hostgroup needs a name attribute which will be used in a monitor’s parameters to refer to this hostgroup. Inside a hostgroup there are three ways to specify hosts:

<host/>:By specifying individual hosts. These can be either IP addresses or hostnames.
<range/>:By specifying an inclusive range of IP addresses. Use the <start/> and <end/> tags to specify the IP addresses. (Note that only IP addresses can be used here).
<subnet/>:By specifying an entire subnet at once. The subnet is specified in CIDR notation using the <net/> tag. By default the network and broadcast address of a subnet are excluded. It is possible however to use one <exclude/> and one <include/> tag (in that order) to exclude and include a range of hosts. These elements can contain any number of <host/> and <range/> tags.

<perfmon>

The <perfmon/> tag is used to specify one or more performance data destinations. This setting is necessary if you want asagent to propagate performance metrics to a performance data collector (e.g. Abilisoft up) and have specified <Perfstat/> directives within a <Monitor/> definition (see section describing <Perfstat>). Here is an example:

<perfmon max_period="60" max_buffer_size="512">
  <destination>
    <host>localhost</host>
    <port>50000</port>
  </destination>
</perfmon>

Only one <perfmon/> element is allowed and has the following attributes:

max_period:A Period value that specifies the maximum amount of time to wait before any collected metrics are sent when output buffer is less than max_buffer_size. This attribute supports the Periodicity syntax. This attribute is optional and if omitted the default is 60 seconds.
max_buffer_size:
 Maximum size in bytes of the output buffer used to queue performance data metrics before the are dispatched to a destination. This attribute is optional and if omitted the default is 512 bytes.

The <perfmon/> element must have at least one <destination/> sub-element. Multiple <destination/> sub-elements can be defined to support high availability configurations. The <destination/> sub-element does not have any attributes but has exactly one of each of the following sub-elements:

<host/>:The IP address, hostname or FQDN of a performance data collector.
<port/>:The port number a performance data collector is listening on.

<credentials>

The <credentials/> tag is used to specify login credentials that might be shared by a number of monitors. Here is an example:

<credentials name="id">
  <username>user</username>
  <password encrypted="false">secret</password>
  <realm>The Realm</realm>
</credentials>

This tag can be used as a child of the <Manifest/> or <Fragment> elements. The name attribute is required and must be unique for all <credentials/> tags. It is used by monitors to refer to this specific set of credentials.

<username/>:The username, this is shared between all authentication mechanisms which need a username.
<password/>:The password, this is shared between all authentication mechanisms which have a password associated with the username.
<realm/>:For HTTP authentication this is the “realm” the username and password should be used for.

When setting the encrypted attribute of the <password/> element to true the string must be encrypted using the public key used by asagent. Otherwise the value will be assumed to be the cleartext password.

It is also possible to use a variable placeholder as value which allows you to use the ascrypt tool to create and encrypted variable, e.g.:

<!-- This variable is created by ascrypt in a fragment -->
<Variable name="pwd" encrypted="True">AFgT...nARe0</Variable>

<credentials name="creds">
  <username>user</username>
  <password>%{pwd}</password>
  <realm>Fake Realm</realm>
</credentials>

<Fragment>

The <Fragment> tag has no attributes. It is used to make sub-manifests valid standalone XML documents. If any tags are used in a sub-manifest, they must be encapsulated in <Fragment>... </Fragment> start and end tags.

<Component>

Attribute Value Description
name string Component’s name.
type type The type of component that specifies the built-in component type, e.g. HOST_CPU, or GENERIC_EMPTY. Specifying the component type incorporates pre-defined monitoring behaviour. See Component Type for more information. Defaults to GENERIC_EMPTY if omitted.
label string Enterprise Only
description string Enterprise Only
img string Enterprise Only
xmlns:xi uri If the component utilises x-include directives, this attribute should always be “http://www.w3.org/2001/XInclude” to specify the xi namespace.

The <Component> tag is fundamental in the monitoring configuration. All Monitors, Observations and Actions are described within it, including:

  • <Variable> - A Variable Parameter, described in <Variable>.

  • <Monitor> - A monitor definition, described in <Monitor>.

  • <Trap> - A Trap action definition, described in <Trap>.

  • <Smtp> - An SMTP action definition, described in <Smtp>.

  • <Cmd> - A Command action definition, described in <Cmd>.

  • <Control> - A Control action definition, described in

    <Control>.

  • <Notify> - A Notify action definition, described in <Notify>.

Component Type

The type attribute is mandatory when specifying a component. Valid values include:

  • HOST_CPU (default name cpu)
  • HOST_DISK (default name disk)
  • HOST_MEM (default name memory)
  • HOST_MON (default name monitor)
  • GENERIC_APPLICATION (default name __GENERIC_APPLICATION__)
  • GENERIC_EMPTY (No default name)

Others types available, contact Abilisoft for more information. When the type attribute specifies a “built-in”, the name attribute is ignored, the built-in name is used. You can add any monitors and additional configuration you like to a component that uses a built-in type but if you specify a configuration item name that is already in use you will override or replace the default behaviour of the built-in.

GENERIC_APPLICATION has some predefined behaviour configured and has a default name __GENERIC_APPLICATION__ which you should override. If you want a completely blank configuration component you should use the GENERIC_EMPTY type. Note that if you do not specify a name one is automatically created.

<Variable>

Attribute Value Description
name string Variable parameter name
encrypted boolean If true, the parameter is interpreted as a cipher. If this attribute is omitted false is implied.
label string Enterprise Only
description string Enterprise Only
img string Enterprise Only

The <Variable/> tag provides a mechanism to set-up placeholder data to be used for mapping onto Sample parameters, Observation messages and Action parameters. A variable may be defined as follows:

<Variable name="appName">SQLServer</Variable>

This allows %{appName} to be used as a placeholder elsewhere in the Component definition. Refer to Placeholders for more information.

You can use the <Variable/> tag as a child of <Manifest/>, <Fragment/>, <Component/> and <Monitor/>. Each time it will be scoped to the level where it is used with the exception that having it at fragment level is the same as at manifest level.

<Monitor>

Attribute Value Description
name string Monitor’s name
regex boolean When true, the monitor name is interpreted as a regular expression and subsequent settings are applied to monitors with matching names.
type type The type of monitor. All monitors are prefixed with “agent”. There are 4 categories of Monitor type: agent.cpu, agent.disk, agent.mem and agent.mon. See Monitor Types for details of all available monitors in each of these categories.
periodicity period The frequency (in seconds) at which the monitor collects sample facet values. A short-hand may be used as described in Periodicity.
enabled boolean The initial enabled state of the monitor. The presence of the sample definition in the manifest (without this attribute defined) implies an enabled state of true.
label string Enterprise Only
description string Enterprise Only
img string Enterprise Only

The <Monitor/> element allows you to define or redefine the monitoring behaviour for a Component. The <Monitor/> element may enclose:

  • Perfstat directives that elect which sample facet values should be dispatched as performance data.
  • Monitor Parameters that coerce the behaviour of that instance (e.g. the path to a file or a regular expression fingerprint for a specific process that you want to monitor). Not all monitor definitions require parameters to be specified.
  • One or more Observations you want to make on the sample results produced by the monitor.

<Perfstat>

A <Perfstat/> element specifies whether a monitor’s sample facet values should be captured and dispatched as performance data metrics to the destinations specified using the <perfmon> element. The <Perfstat/> element does not have any attributes. The element’s #PCDATA value specifies the name of a sample facet to capture, e.g:

<Perfstat>percentTotal</Perfstat>
<Perfstat>percentUser</Perfstat>
<Perfstat>percentSys</Perfstat>

The following shorthand is allowed:

<Perfstat>*</Perfstat>

This will simply collect and dispatch all sample facets as performance data, regardless of which other <Perfstat/> directives have been specified.

If the facet name specified does not relate to a facet for the parent monitor’s type it will be ignored. Refer to Monitor Types for a definition of valid sample facet names.

Note

It is possible to send non-scalar data as a performance data metric. Abilisoft up makes it possible to turn non-scalar data into scalar data (e.g. values like ‘running’, ‘suspended’ and ‘off’ can be mapped to 1, 2 and 3 respectively).

<Perfstat> elements are only permitted within <Monitor> definitions.

<Parameter>

Attribute Value Description
name string Parameter’s name
index int When a parameter is used in certain roles in a Manifest, the index value describes an ordinal position for the parameter.
encrypted boolean If true, the parameter is interpreted as a cipher. If this attribute is omitted false is implied.
label string Not used
description string Not used
img string Not used

The <Parameter> tag is used to provide parameter data for Monitors, Observation Tests, and Actions. A parameter has a name and a #PCDATA value as follows:

<Parameter name=”regex”>/usr/bin/sshd</Parameter>

Consult <Observation> and the relevant appendices Monitor Types and Actions for more information on parameter setting for Tests, Monitor parameters and Action parameters respectively.

<Observation>

Attribute Value Description
name string Observation’s name
enabled boolean Defines if the Observation is active. If this attribute is omitted true is implied.
label string Not used
description string Not used
img string Not used

An <Observation> tag is used to specify tests on sample data. Specifying an Observation is the first step towards getting a notification on a particular event, for example, if a value exceeds a particular threshold, an occurrence of some pre-defined text appears in a log file or when the timestamp of file changes. Some observations are predefined and their behaviour can be modified, for example, an alternate message can be associated or the test adjusted. Predefined observations can also be completely overridden. Also, completely new observations can be added. The following sub-sections detail an Observation’s content.

<Test>

The <Test/> tag specifies the test which must be performed to determine if an observation should be created or not. Most tests take parameters which can be either literals or functions where functions can be used to extract data out of the facets of the current sample.

Attribute Value Description
name string Test name (optional)
type type The Test type: see appendix Tests
edge string Enable edge detection, up or down (optional)
label string Not used
description string Not used
img string Not used

Each <Test/> tag must have the type attribute which specifies which test will be used. Available test types include threshold, compare, null, notNull, true, false, processUp, processDown and regex. Refer to Tests for a full description of all available test types. Each test type has one or more parameters, some which have fixed values while others (usually named argN) will allow arbitrary Test Argument Expressions. Consider the following compare test example:

<Test type=”compare”>
  <Parameter name=”arg0”>$percentTotal</Parameter>
  <Parameter name=”operator”>gt</Parameter>
  <Parameter name=”arg1”>75</Parameter>
</Test>

It is very important that a facet value is specified with a leading ‘$’ to distinguish it from a literal string. The compare test takes parameters named arg0, arg1 and operator. The arg0 and arg1 parameters can be ether literals or facet IDs (The operator must be a valid relational operator: eq, ne, lt, le, gt, ge).

Edge Attribute

The edge attribute is a suppression mechanism. When set, a test will only return true (and potentially cause an observation to fire) on a condition edge. That is, when a test result changes from the previous test result.

Every time a sample is acquired by a monitor the agent’s analysis engine looks for any observations defined for that monitor and runs its test against the current sample data (i.e. the latest sample, but depending on the test, the “last” sample may be checked too). The test will either return “true” or “false”. If it returns true, the observation will “fire” and any actions defined for the observation will be executed (e.g. a trap).

So for example, if a cpuUsage monitor gets a percentTotal sample facet value of 80% and an observation is defined with a compare test checking $percentTotal > 75 then the test will return true. If the next sample also gets a percentTotal sample facet value of 80% then obviously the test will return true again and the observation will fire again. Therefore the result of test runs on two samples is [true, true].

The “edge” attribute makes an observation fire on an edge condition. When using edge="up" the observation will only fire when a first test returns false and the very next test returns true. Imagine subsequent test runs returning [result, result] when edge="up":

[false, true] - observation will fire
[true, true] - observation will NOT fire
[true, false] - observation will NOT fire
[false, false] - observation will NOT fire

When edge="down" we get the opposite effect:

[false, true] - observation will NOT fire
[true, true] - observation will NOT fire
[true, false] - observation will fire
[false, false] - observation will NOT fire

It is obvious to say but important to note that the last and current test result used in the evaluation of the edge condition are the ones that would result without the edge attribute being set.

Also note that when discussing observations firing we did not consider other suppression logic set by the <Suppression/> tag for the sake of simplicity.

Test Argument Expressions

Most tests will have an argN parameter, the value of this is called a test argument expression. As briefly shown above this can be made to refer to a facet value or a literal value to create meaningful tests. But the full expression allowed is more complicated and caters for more involved test cases.

Facets can only have a limited number of data types. The following data types relate to testing facets:

  • boolean
  • integer
  • float
  • string

A test argument expression will always evaluate to a value that is one of these types. When executing the test it is possible to perform a comparison between different data types but this may not always be very meaningful. For example: comparing a number with a string is valid but not very useful, comparing a boolean with a number is meaningful, a boolean will be represented by a 0 or a 1.

The basic elements of a test argument expression are literals and functions, the latter will always result in a simple value of a specific type and can be nested.

Literals are as follows:

int:
Integers are any character sequence containing just digits, e.g. 42.
float:
Floats are any character sequence containing digits with exactly one . (dot), e.g. 42.0.
string:
Strings are any other sequence of characters, e.g. foo.

There is no literal for the bool type, but since booleans are a sub-type of integers you can just use 1 and 0 with no semantic difference.

Functions allow you to perform some computation on a test argument expression. Essentially there are two types of functions:

Casting functions:
bool(), int(), float(), string()
Facet functions:
$, fvalue(), flast(), fdelta(), fexists(), fmissing(), fcount(), last(), delta()

Since nesting is allowed you can create more complicated expressions, e.g.:

<Parameter name="arg0">bool(fvalue(facetname))</Parameter>

If a function gets an invalid argument it will raise an exception, the result of which is that the corresponding test will fail. Note that this test failure will not count towards any behaviour of the suppression logic.

bool(val)

This will cast val to a boolean. For any non-zero number, including negative numbers, return True. For any string with a non-zero length returns True. In all other cases this function will return False.

int(val)

This will cast val to an integer. String values must comply to the format of integer literals for this cast to work.

This allows you to use data from facets as numbers instead of strings. Consider an agent.mon.dbMonitor sample which retrieved the number of users in a database. The row facet has a type of string so you cannot normally compare this sensibly in a test. This example casts it to an integer first to check there are no more than 50 users logged in:

<Test type="compare">
  <Parameter name="arg0">int(fvalue(row))<Parameter>
  <Parameter name="operator">gt</Parameter>
  <Parameter name="arg1">50</Parameter>
</Test>
float(val)

Cast val to a float. String values must comply to the format of float literals.

string(val)

Cast val to string. Useful if you want to treat a literal value as a string which would otherwise be interpreted as a number.

fvalue(facet)

This function will return the value of the facet named facet (a string). The value returned will be in the type of the facet, if the facet does not exist an exception will result.

$facet

The $ sign is a shortcut for fvalue(). The facet part must be the name of a facet and will be passed in as the argument to fvalue().

Note that using this shortcut does not allow nesting of other functions, facet can only be the name of a facet and not a function that returns the name of a facet.

flast(facet)

This is similar to fvalue() but will return the facet value of the last sample instead of the current sample.

fdelta(facet)

This will return the value of fvalue() - flast() for a facet. The facet types must be numerical.

fexists(facet)

Returns True if a facet with the given name exists.

fmissing(facet)

Returns True if a facet with the given name does not exist.

fcount()

Returns the number of facets in the current sample as an integer.

last($facet)

Deprecated

This behaves as flast() but facet can only be the name of a facet, no nested functions are allowed. The $ sign is obligatory.

delta($facet)

Deprecated

This behaves as fdelta() but facet can only be the name of a facet, no tested functions are allowed. The $ sign is obligatory.

<Message>

The <Message/> tag specifies the message associated with an Observation. This is a free text string but may contain place-holders as described in the Placeholders section enabling a sample facet value, parameter values and global variable values to be used in the message. The Observation Message may be mapped onto an Action parameter with the {message} placeholder. The Message tag has no attributes.

<Suppression>

The suppression logic can be used to suppress observations from being created multiple times. By default a <Suppression> element does not have to be specified for an <Observation>. If it is omitted the following suppression definition is assumed:

<Suppression numberOfTimes="1" repeat="0"/>

See the remainder of this section to understand the effect of these default values.

Attribute Value Description
name string Suppression name (optional).
numberOfTimes int Count of successful tests before an observation is created.
repeat period This is a period (in seconds or using the shorthand described in Periodicity) that defines an Observation repeat window. See below for more information.
label string Not used
description string Not used
img string Not used

The numberOfTimes attribute controls how many times the test of the observation must evaluate to True before an observation is made. If the test evaluates to False before the counter reaches the number specified in numberOfTimes then the counter gets reset.

For example, given a CPU threshold observation at 95% and a numberOfTimes=3 the following situation would cause the observation to fire:

93% 99% 98% 98% 91% 91% 91%
  X X X      

However this scenario:

91% 96% 96% 91% 96% 96% 91%
  X X RESET X X  

would not cause the observation to fire.

The suppression can be further enhanced by specifying a “repeat” window. The repeat window specifies how long to wait before an observation can be “re-fired” for a persistent condition. There are 3 ways to set the repeat window:

Repeat Value Description
repeat=0 Always repeat observations. The repeat observation suppression window expires right away. This means that once the suppression numberOfTimes is overcome then the observation will re-fire and it will continue to re-fire until the original condition goes away.
repeat=<period> Repeat observations after <period>. The repeat observation suppression window expires after <period> seconds. <period> can be any positive integer or a ‘period’ value, e.g. ::1:. Once the period has expired, the suppression is reset and once numberOfTimes samples (indicating the condition still prevails) have been detected the observation will re-fire.
repeat=-1 Don’t create repeat observations. Repeat observations are suppressed until the condition that caused the observation is no longer true for at least one sample. This provides observations that fire on “condition edges”.

Finally, the suppression logic caters for observation “blackouts”. You can specify any <Weekday> or <Date> tags within the <Suppression> tag. Observation can be suppressed for particular days and times or distinct periods. (All times defined are interpreted as UTC). The “day” attribute is case insensitive. For example the following:

<Weekday day=”monday” startTime=”09:00” endTime=”10:30”/>

will suppress an observation every Monday morning from 09:00 HRS to 10:30 HRS. A weekday tag without a start and end time:

<Weekday day=”saturday”/>

will suppress an observation every Saturday from 00:00 HRS to 23:59 HRS. Either startTime or endTime or both startTime and endTime may be omitted. Note that the comparison excludes seconds so the next observation that will be allowed after a blackout that ends on Saturday at 23:59 will be one with a timestamp greater that 00:00, say for example, 1 millisecond after midnight Sunday morning.

Omitting the weekday or setting “day=”*” will set the blackout from Monday to Sunday. For example:

<Weekday day=”*” startTime=”09:00” endTime=”10:30”/>

is equivalent to:

<Weekday startTime=”09:00” endTime=”10:30”/>

and is also is equivalent to:

<Weekday day=”monday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”tuesday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”wednesday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”thursday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”friday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”saturday” startTime=”09:00” endTime=”10:30”/>
<Weekday day=”sunday” startTime=”09:00” endTime=”10:30”/>

Therefore:

<Weekday/>

has the same effect as disabling the observation.

Date tags specify an absolute period with a start and end date-time (specification of each is mandatory):

<Date startDateTime=”2008/06/28 09:00:00” endDateTime=”2008/06/28 17:30:00”/>

will suppress an observation on 28th June 2008 from 09:00 HRS to 17:30 HRS. Note the seconds precision on dates. The startDateTime and endDateTime attributes must be in the format:

“YYYY/MM/DD HH:MM:SS”

Date style blackouts will happily overlap Weekday blackouts. If Date style blackouts overlap then the blackout will be effective from the earliest start date to the latest end date. For example:

<Weekday day=”saturday” startTime=”09:00” endTime=”17:30”/>
<Date startDateTime=”2008/08/01 15:00:00” endDateTime=”2008/08/02 17:30:00”/>

As the 1st of August 2009 was a Saturday the blackout will have prevailed from 09:00 on Saturday 1st August 2009 until Sunday 2nd August 2009 at 17:30. Additionally:

<Weekday day=”saturday” startTime=”09:00” endTime=”17:30”/>
<Date startDateTime=”2008/08/01 17:30:00” endDateTime=”2008/08/01 18:30:00”/>

will extend the Saturday blackout on the 1st August by 1 hour.

Actions

Attribute Value Description
name string Action name. Some actions are pre-configured in “built-ins” and re-using its name will override its behaviour.
enabled boolean If true this attribute specifies that the action is enabled.
label string Not used
description string Not used
img string Not used

Actions can be added to the Observation definition. This enables you to enable, disable or override any default setting or add a new Action to the observation. Available actions are described in <Trap>, <Smtp>, <Cmd>, <Control>, <Notify> and Actions in detail.

<Trap>

A Trap Action will cause a pre-defined SNMP trap to be fired to escalate the Observation according the named trap definition. See Actions.

<Smtp>

An Smtp Action will cause an SMTP message to be sent to escalate the Observation according the named Smtp definition. See Actions.

<Cmd>

A Command Action will cause an operating system command to be invoked to escalate the Observation according the named Command definition. See Actions.

<Control>

A Control Action can start or stop individual monitors (or all monitors for a component) if the parent observation fires. See Actions.

<Notify>

A Notify Action will cause an Abilisoft Event to be sent to Abilisoft’s Real-time Enterprise Event Feed (Reef) server for correlation and de-duplication. Events will be available for display in the Reef dashboard. See Actions.

Definition Override Behaviour

There is a mechanism that allows you to specify a range of settings that you may want to override, enable or disable. This is accomplished by setting the regex attribute on Sample or Observation to true. This means that the Sample or Observation name is interpreted as a regular expression and the subsequent settings are applied to all matching Sample or Observation definitions. For example, this definition will cause all diskUsage monitors with a matching name to be disabled:

<Component name="disk" type="HOST_DISK">
  <Monitor name="^/" regex="true" enabled="False"/>
</Component>

This definition will cause all diskUsage monitors with a matching name to have any Observation called diskUsageThreshold be disabled:

<Component name="disk" type="HOST_DISK">
  <Monitor name="^/" regex="true">
    <Observation name="diskUsageThreshold" enabled="false"/>
  </Monitor>
</Component>

This definition will cause all diskUsage monitors with a matching name to have any Observation with a name matching the regular expression disk.*?$ to be enabled. Put simply, all observations (with a name like “disk-something”) for disks named sda1 and sda2 will be enabled:

<Component name="disk" type="HOST_DISK">
  <Monitor name="^/" regex="true">
    <Observation name="disk.*?$" regex="true" enabled="true"/>
  </Monitor>
</Component>

This mechanism can be used to override Observation Source Facets, Messages and Tests in batches. It can also be used to add additional Observations and Actions in batches.

Agent Runtime Configuration

The Agent has many runtime configuration settings that can be set in the Agent configuration file ($AS_HOME/etc/asagent.conf), by command like parameter and by environment variable. Refer to Runtime Configuration for more information.