System Monitor configuration file

Updated: 2018/08/11
O/S : Aix, Linux, MacOS

 
NAME

{HOSTNAME}.smon   -   System Monitor configuration file.

 
SYNOPSIS

{HOSTNAME}.smon

  • This file is located in the "${SADMIN}/cfg" directory, and is name after the HOSTNAME with the "smon" extension. So for a server named "holmes" it would try to open the file "$SADMIN/cfg/holmes.smon". It is loaded in memory at the beginning of the execution of the SADMIN system monitor, all the requested test are performed and finally the updated file is written back to disk.
 
DESCRIPTION
  • Blank lines and lines that begin with a "#" are ignored by the System Monitor

  • #---------------------------------------------------------------------------------------------------
    # v2.5 - SADM System Monitor Configuration file
    #   The file need to be place in ${SADMIN}/cfg and name `hostname`.smon
    #   This file is read and updated by SADMIN System Monitor (sadm_sysmon.pl) every time it run.
    #
    #     - New Filesystem line will be added automatically to this file when detected.
    #     - Filesystem can be increase automatically by 10% two times within 24 Hrs
    #         - If the last field (Script to Run Field) of the filesystem line contain "sadm_fs_incr.sh"
    #
    #     - Comments lines must always begin with a "#" (Column 1)
    #     - Column are delimited by space(s) and/or tabulation.
    #     - At execution each line is evaluated based on the test requested by the first column.
    #         - The result of the operation is reflected in the second column.
    #
    #---------------------------------------------------------------------------------------------------
    # Col   Name            Description
    #---------------------------------------------------------------------------------------------------
    #  1    IDENTIFIER      This name specify the test we want to run.
    #                       These are the one presently supported :
    #
    #       'script:yourscript.sh'  
    #               If you wish to run a custom script to you wrote, this is the one. 
    #               Place the word 'script:' in column 1, followed by the name of your script. 
    #               Be sure that your script exit with a return code of 0 or 1, so you can test it. 
    #               Script execution time must be as short as possible, cause it will run each time 
    #               the System Monitor is run.
    #       load_average
    #               Return system load average of the past 5 minutes, using 'uptime' (Linux,Aix,MacOS).
    #               Actual value is returned in column 2 of the line.
    #       cpu_level
    #               Return cpu usage using 'vmstat' command (Linux/Aix) and 'iostat' on MacOS. 
    #               The 'usr'added to the 'sys' value is returned in column 2 of the line.
    #       swap_space
    #               Return the percentage of the Swap/Paging Space used.
    #               On Linux we use 'free | grep -i swap', on Aix 'lsps -a' and on MacOS 
    #               'sysctl vm.swapusage'. Actual value is returned in column 2 of the line.
     
    #       service_                
    #               Return 1 if the service is active and 0 if it's not.
    #               Service name follow the identifier prefix 'service_'.
    #
    #               They can be one or multiple service name specified.
    #               Example, Redhat use 'crond' service name, Ubuntu use 'cron'. 
    #               To test if the cron daemon is running on both system you specify multiple services.
    #               Example : service_crond,cron
    #               When specifying multiple service name, separate them by a comma ','. 
    #               If one of them is active then 1 is returned.
    #
    #               If service is down, you can run one of your script to bring it up or used the 
    #               one that come with SADMIN ($SADMIN/usr/mon/srestart.sh).
    #               You can then put the restart script in the last field of the line, to get executed.
    #               The script name you put in the last field must exist in '$SADMIN/usr/mon' directory 
    #               and be executable. 
    #               If the service is down the script will be executed, but no more than twice in a 
    #               24 Hrs period. 
    #               SADMIN SysMon does this by looking at column 6 (RunCounter), H (date) & I (Time).
    #                   - The H and the I fields contain the date and time of the last script execution.
    #                   - The Column 6 contain the number of time the script ran in the last 24 hrs.
    #
    #               With this information SysMon can now know if it was run in the last 24 hours. 
    #                   - If it wasn't run in the last 24 Hrs, then the RunCounter(Col 6) is reset to 1,
    #                     and the script specified at the end of the line is executed. 
    #                   - If the script was executed in the last 24Hrs and if the RunCounter is less 
    #                     than 2, then the script is executed. 
    #                   - If the script was executed in the last 24Hrs and if the RunCounter is greater 
    #                     than 1, the script is not executed, since it already ran twice in the last 
    #                     24 hours.
    #
    #       daemon_ 
    #               The function search to the specify "name" is present in the process list (ps) of 
    #               the system. The search is case sensitive.
    #               Example: "daemon_batcode"   
    #                   - Will return 0 if 'batcode' is not in the process list.
    #                   - Will return a value greater than zero, if it appear in the process list.
    #                     If it appear four times, it will return 4.
    #
    #       check_multipath
    #               SysMon check each path of the multipath and if one one them is not active or ready,
    #               A value of 1 is returned if everything is ok or 0 when an error is detected. 
    #               Line will only be evaluated if the command 'multipathd' is present on system.
    #                       
    #       http_   
    #               Test the web site respond (1=Up 0=NoResponse)
    #               Example : http_sysinfo.maison.ca
    #
    #       ping_
    #               Everytime Sysmon is run is will ping the name or the IP your have specified after 
    #               the prefix 'ping_'.
    #               Example : ping_www.google.com
    #               This will ping the 'www.google.com' web site and return a 0 is it ping, if not a 1.
    #
    #       FS      Everytime Sysmon is run the server smon configuration file is loaded in memory,
    #               updated in memory during execution and then unloaded to disk when terminated.
    #               So SysMon is able to know when a new filesystem is created on the system and it is
    #               added automatically to his configuration file. 
    #               The warning threshold is set to 80% and the error to 90%.
    #               These are initial value, you can change them and they will remain to your setting.
    #
    #---------------------------------------------------------------------------------------------------
    # Col   Name            Description
    #  2    VALUE RETURNED  It's the value returned that is in conjunction with what is asked in col 1.
    #                       For 'FS' it is the percentage used, for 'ping_' it will be a 0 or a 1, 
    #                       for 'service_' it return 1 if the service is active and 0 if it's not.
    #---------------------------------------------------------------------------------------------------
    #  3    Operator        This dictate what operator will be used to compare the actual value (Col 2)
    #       =,!=,<,>,=>,=<  with the 'Warning' (Col 4) and the 'Error' (Col 5).
    #---------------------------------------------------------------------------------------------------
    #  4    Warning Level   This value is compare against the value returned using the operator in 
    #                       column 3. If it's True then a Warning is raise
    #                       When you leave this value at 0, then the Warning threshold is not evaluated.
    #---------------------------------------------------------------------------------------------------
    #  5    Error Level     This value is compare against the value returned using the operator in 
    #                       column 3. If it's True then an error is raise.
    #                       When you leave this value at 0, then the Error threshold is not evaluated.
    #---------------------------------------------------------------------------------------------------
    #  6    Duration Min/   This field have a double usage :
    #       RunCounter          - For load_average, cpu_level and swap_space it's the number of minutes
    #                             that the value returned must exceed the Warning or Error before a 
    #                             Warning or an Error is raised.
    #                             Example : You set the warning threshold of the 'cpu_level' line at 80%
    #                                       but you don't want to trigger an error if the CPU Level 
    #                                       exceed 80% for only 2 minutes. So you may want to set this
    #                                       column to 120 so an error would be trigger only when the cpu
    #                                       level exceed 80% for at least 120 continuous minutes. If you
    #                                       leave it 0, is sysmon is run at the cpu_level exceed 80% an
    #                                       error will be trigger immediatly.
    #                           - For Service (service_) and Filesystem (FS) lines it represented the
    #                             number of time the service was restarted or the filesystem was
    #                             increased in the last 24 Hours. You don't need to change this field,
    #                             it's taken care of automatically (Unless you want to reset it to 0 and
    #                             provoke a service restart or a filesystem increase within the 24 hrs).
    #---------------------------------------------------------------------------------------------------
    #  7    Monitor         Hour of the day you want to enable evaluation of the line (In 24:00 format)
    #       Start Time      Let's say you only want to evaluate a line between 7am and 9pm, you would
    #                       put 0700 in this column and 2100 in column 8 (Monitor End Time)..
    #                       If you leave this column at 0000 the line is evaluated 24 hours a day, 
    #                       unless specify otherwise in column 9 to G.
    #---------------------------------------------------------------------------------------------------
    #  8    Monitor         Hour of the day you want to disable evaluation of the line (In 24:00 format)
    #       End Time        Let's say you only want to evaluate a line between 7am and 9pm, you would
    #                       put 0700 in column 7 and 2100 in this column (Monitor End Time)..
    #                       If you leave this column at 0000 the line is evaluated 24 hours a day, 
    #                       unless specify otherwise in column 9 to G.
    #---------------------------------------------------------------------------------------------------
    #  9    Enable Sunday   If this value is set to 'Y', the line will be evaluated on Sunday (Default).
    #                       If this column is set to 'N', the line will not be evaluated on Sunday.
    #---------------------------------------------------------------------------------------------------
    #  A    Enable Monday   If this value is set to 'Y', the line will be evaluated on Monday (Default).
    #                       If this column is set to 'N', the line will not be evaluated on Monday.
    #---------------------------------------------------------------------------------------------------
    #  B    Enable Tuesday  If this value is set to 'Y', the line will be evaluated on Tuesday (Default).
    #                       If this column is set to 'N', the line will not be evaluated on Tuesday.
    #---------------------------------------------------------------------------------------------------
    #  C    Enable Wed.     If this value is set to 'Y', line will be evaluated on Wednesday (Default).
    #                       If this column is set to 'N', line will not be evaluated on Wednesday.
    #---------------------------------------------------------------------------------------------------
    #  D    Enable Thursday If this value is set to 'Y', the line will be evaluated on Thursday(Default).
    #                       If this column is set to 'N', the line will not be evaluated on Thursday.
    #---------------------------------------------------------------------------------------------------
    #  E    Enable Friday   If this value is set to 'Y', the line will be evaluated on Friday (Default).
    #                       If this column is set to 'N', the line will not be evaluated on Friday.
    #---------------------------------------------------------------------------------------------------
    #  F    Enable Saturday If this value is set to 'Y', the line will be evaluated on Saturday(Default).
    #                       If this column is set to 'N', the line will not be evaluated on Saturday.
    #---------------------------------------------------------------------------------------------------
    #  G    Active/Inactive If this value is set to 'Y', the line will always be evaluated (Default)
    #                       If this column is set to 'N', the line will never be evaluated.
    #                       This is used, when you want temporarely deactivate a line.
    #---------------------------------------------------------------------------------------------------
    #  H    Last Event Date This column contain the date of the last event (script execution, cpu_level
    #                       exceeded, swap_space exceeded, filesystem increase, service restarted,..)
    #                       If it used by sysmon when condition are related to time period.
    #                       You don't need to change this field, sysmon take care of it.
    #---------------------------------------------------------------------------------------------------
    #  I    Last Event Time This column contain the time of the last event (script execution, cpu_level
    #                       exceeded, swap_space exceeded, filesystem increase, service restarted,..)
    #                       If it used by sysmon when condition are related to time period.
    #                       You don't need to change this field, sysmon take care of it.
    #---------------------------------------------------------------------------------------------------
    #  J    Group Name      Group name to Advise (Slack Channel Name, Futur Version) Not used for moment
    #---------------------------------------------------------------------------------------------------
    #  K    Email Group     Email Group Name - For now will send to SADMIN Admin Email
    #---------------------------------------------------------------------------------------------------
    #  L    Script Name     Never use by sysmon if it's blank or contain '-'.
    #                       When the line is evaluated and it turn out to be in error, if you put a
    #                       script name it will be executed (You may want to correct the error with it).
    #                       Script MUST reside in $SADMIN/usr/mon directory and be executable.
    #                       The script will produce a log in that same directory, with the same name as 
    #                       your script.
    #                       Fie filesystem increase, the name of the script MUST be 'sadm_fs_incr.sh'.
    #---------------------------------------------------------------------------------------------------
    # IDENTIFIER - COLUMN 1          2  3   4   5   6  7    8   9 A B C D E F G     H     I   J    K   L
    #---------------------------------------------------------------------------------------------------
    #
    # ----- Run specific scripts, check return code and issue a warning or error based on threshold
    # Example 
    #script:stemplate.sh       1  =  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180911 1520 mail sadmin -
    #
    # SADMIN Script Don't remove - This script make sure 'nmon' performance collector is running
    script:swatch_nmon.sh            0  =  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180911 1520 mail sadmin -
    #
    #
    # ----- Aix/Linux/MacOS CPU Load, Server Load Average and Swap Space usage Monitoring
    load_average                     0  >  20  35 120 0700 2100 Y Y Y Y Y Y Y Y 00000000 0000 mail sadmin -
    cpu_level                       29 >=  85  95 240 0700 2100 Y Y Y Y Y Y Y Y 00000000 0000 mail sadmin -
    swap_space                      14  >  85  90 000 0000 0000 Y Y Y Y Y Y Y Y 20180911 1520 mail sadmin -
    #
    # ----- Linux Service Monitoring
    service_crond,cron               1  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    service_chronyd,ntp,ntpd         0  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    service_ssh,sshd                 1  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    service_postfix,sendmail         1  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    service_syslog,rsyslog,syslogd   0  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    #service_at,atd                   1  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    #service_named,named-chroot       0  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    #service_dhcpd                    0  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin srestart.sh
    #
    # ----- Monitor Daemon or Process Running 
    #daemon_mydaemon                 1  <  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 20180820 1520 mail sadmin -
    #
    # ----- Check Linux Multipath Status (0=Error 1=All path(s) are Online/Ready)
    check_multipath                  1 !=  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 00000000 0000 mail sadmin -
    #
    # Ping server (Return 0=Ok 1=Error) Line below raise a Warning if site doesn't respond to ping.
    ping_www.google.com              0  =  01  00 000 0000 0000 Y Y Y Y Y Y Y Y 00000000 0000 mail sadmin -
    
    # Ping server (Return 0=Ok 1=Error) Line below raise an Error if site doesn't respond to ping.
    ping_www.ibm.com                 0  =  00  01 000 0000 0000 Y Y Y Y Y Y Y Y 00000000 0000 mail sadmin -
    #
    #
    #
    # ----- Filesystem Monitoring
    # Can increase it by 10%, two times within 24hours maximum, if script "sadm_fs_incr.sh" in Column L.
    # FS/example                    23 >=  25  90 001 0000 0000 Y Y Y Y Y Y Y Y 20180615 0824 mail sadmin sadm_fs_incr.sh
    #---------------------------------------------------------------------------------------------------
    # IDENTIFIER - COLUMN 1          2  3   4   5   6  7    8   9 A B C D E F G     H     I   J    K   L
    #---------------------------------------------------------------------------------------------------
    #
    #SADMSTAT 2.5 holmes - Sat Feb  4 09:56:10 2017 - Execution Time 5.00 seconds
    


 
AUTHOR
Jacques Duplessis (jacques.duplessis@sadmin.ca.).
Any suggestions or bug report can be sent at http://www.sadmin.ca/support.php

 
COPYRIGHT
Copyright © 2018 Free Software Foundation, Inc. License GPLv3+:
    - GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software, you are free to change and redistribute it.
There is NO WARRANTY to the extent permitted by law.
 
SEE ALSO
sadm_sysmon.pl   (SADMIN System Monitor)
Copyright © 2015-2019 - www.sadmin.ca - Suggestions, Questions or Report a problem at support@sadmin.ca