How-to use SADMIN alerting system

6 minute read

Page under construction, should be completed soon, come back later.

Requirements:

Understanding SADMIN alerting facilities

This guide, we will demonstrate with an example, how you can use the SADMIN alerting facilities.

The default alert group

An alert can be issue in two ways, either by a script or by the SADMIN System Monitor. An alert is always sent to an alert group. The default alert group name used is the one specified in the SADMIN configuration file on the system that triggered the alert. Every system have his own SADMIN configuration file ($SADMIN/cfg/sadmin.cfg). The default alert group after the installation is called (guess what) ‘default’.

#----------------------------------------------------------------------------
# Default Alert Group is 'default' (As defined in $SADMIN/cfg/alert_group.cfg)
# Group specified here, MUST exist in the $SADMIN/cfg/alert_group.cfg.
#
# Use in Script:
#   Default Alert Group use for sending alert within your script.
#   Default can be overridden by changing 'SADM_ALERT_GROUP' variable at the
#   top of the script.
#
# Use by System Monitor
#   When the System Monitor detect something that it could monitor (Like a
#   new filesystem), a new monitoring line is added with this Alert group.
#   Default can be overridden by changing the Warning Alert Group (Column J)
#   and the Error Alert Group (Column K), in the System Monitor configuration
#   file of the system ($SADMIN/cfg/`hostname -s`.smon).
#
#----------------------------------------------------------------------------
SADM_ALERT_GROUP = default


The default alert type

The alert type is use only at the end of the script, to decide if an alert/notification need to be send. The default alert type is set at installation time and is set to ‘1’ (Alert only on error). You can view the actual default alert type by looking in the SADMIN configuration file ($SADMIN/cfg/sadmin.cfg). When you run a Python or a Shell script using the SADMIN Tools, it use this default value.

#----------------------------------------------------------------------------
# Default option for sending alert (notification) after a SCRIPT terminate.
# This default, can be overridden by changing 'SADM_ALERT_TYPE' in the
# 'SADMIN code section' of your script.
# variable at the top of template script.
# 0 = Don't send any alert when script terminate.
# 1 = Send alert only if script terminate with error.
# 2 = Send alert only if script terminate with success.
# 3 = Always send notification with status when script terminate.
#----------------------------------------------------------------------------
SADM_ALERT_TYPE = 1
The Alert type have four possible values :
0 Meaning that you don't want any alert to be send, either if the script finish with success or failure.
1 You want to send an alert only if the script finish with an error (Exit code not equal to 0).
2 An alert will be send only when the script finish with success (Exit code = 0).
3 Always send an alert, whether the script terminate with failure or success.

Back to the top


Overriding default alert group and alert type

You can override the default alert group and alert type by uncomment the variables SADM_ALERT_TYPE and/or SADM_ALERT_GROUP for shell script and st.cfg.alert_type and/or st.cfg_alert_group for Python value in the SADMIN section near the top of the script.

Shell Script

#export SADM_ALERT_TYPE=1            # 0=None 1=AlertOnErr 2=AlertOnOK 3=Always
#export SADM_ALERT_GROUP="default"   # AlertGroup Use for Alert(alert_group.cfg)

Python Script

#st.cfg_alert_type  = 1              # 0=None 1=AlertOnErr 2=AlertOnOK 3=Always
#st.cfg_alert_group = "default"      # AlertGroup use for Alert(alert_group.cfg)

Back to the top


Overriding the default alert group in SADMIN System Monitor

The System Monitor alert group is specify in column J for Warning and in column K for Error of the Sysmon configuration file. In the example below, we check filesystem “/opt” space usage. The filesystem usage is currently at 69%, the warning threshold is at 85% and the error at 90%.

#Column 1  2  3  4  5   6   7    8   9 A B C D E F G    H       I     J      K   L
FS/opt     69 >= 85 90 000 0000 0000 Y Y Y Y Y Y Y Y 00000000 0000 sdevops sprod -
  • If the filesystem usage percentage become greater or equal to 85%, a warning alert will be send to ‘sdevops’ alert group.
  • If the filesystem usage percentage become greater or equal to 90%, an error alert will be send to ‘sprod’ alert group.
  • If the Alert group used in SysMon configuration don’t exist in the alert group file, the group ‘default’ is use.


Example of overriding default alert group and alert type in a script

The example that follow below is to show how to override the default alert group and alert type in your script. We will use the Slack alert method in this example.

But before we begin, here is an example of the four different methods (m,t,c,s), you can use with SADMIN.

Example of a typical alert group file

# Email Alert Group 
e_sysadmin          m   batman@batcave.com
e_webteam           m   batman@batcave.com,robin@batcave.com

# SMS (Texto) Alert Group, members are Cellular name
sms_sysadmin        t   cell_support
sms_network         t   cell_support,cell_telecom
sms_emergency       t   cell_support,cell_telecom,cell_custsup

# Cellular number of individual (Use only in SMS alert group)
cell_support        c   5147577444
cell_telecom        c   5147560444
cell_custsup        c   4507570468
cell_support_1      c   4187779402
cell_support_2      c   5147779444

# Slack Alert Group
# 3th column is the channel name defined in Slack Alert file (alert_slack.cfg).
sdev                s   sadm_dev
sprod               s   sadm_prod
sinfo               s   sadm_info
sdevops             s   sadm_devops

The alert group file ($SADMIN/cfg/alert_group.cfg) below is the one we base our example upon.

Alert Group File

Here is a portion of the Slack alert file ($SADMIN/cfg/alert_slack.cfg), we used for our example. Slack Alert File

Let’s make a copy of the shell template script to create our test script. In our test script, we will change the alert group from the default to ‘sprod’ and the alert type from 1 to 3. Let’s begin by typing the following command to create our test script name ‘sadm_test_alert.sh’ ;

# cd $SADMIN/usr/bin
# /sadmin/usr/bin  
# /sadmin/usr/bin cp $SADMIN/bin/sadm_template.sh sadm_test_alert.sh
# /sadmin/usr/bin nano sadm_test_alert.sh

With your favorite editor change the circled lines.

Script before the change Script before change

Script after the change Script after change

Running our test script Run test script

Alert that we received in Slack Slack Alert received

A look at the ‘log’ and ‘rch’ file generated Log and rch generated

It is simple as that, so I hope you have a better understanding of how the alarm system work in SADMIN. We could have done the same test using the Python template ($SADMIN/bin/sadm_template.py) script.


SEE ALSO

Link to … Description
sview Command line summary of alerts and failed scripts of all your servers.
sadm_daily_report Produce and email monitoring daily reports
sadmin.cfg SADMIN main configuration file
sadm_sysmon.pl Client system monitor
sadm_fetch_clients.sh rsync all .rch/.log/.rpt from actives clients to the SADMIN server
SysMon configuration file Client System Monitor configuration file
smon Allow you run SysMon and see the report file
How-to create a Slack workspace Create a Slack Workspace
How-to create Slack channel and App. Configure Slack Channel
How-to use SADMIN alerting system Understanding SADMIN alerting system