sadm_sysmon.pl

Updated: 2018/08/11
O/S : Aix, Linux, MacOS

 
NAME

sadm_sysmon.pl   -   Perform selected monitoring test defined in sysmon configuration file.

 
SYNOPSIS

sadm_sysmon.pl

 
DESCRIPTION
  • This script is executed at regular interval from the SADMIN client crontab (/etc/cron.d/sadm_client).
  • When the script begin it try to open is configuration file, name "HOSTNAME.smon", so for a host named "server1" it would try to open the file "$SADMIN/cfg/server1.smon". If the file exist, it's loaded in memory and processing begin. If the file doesn't exist, it copy the template named "$SADMIN/cfg/.template.smon" to "$SADMIN/cfg/server1.smon" and execution begin. If the sysmon template file, can't be found then script execution is aborted after the user is advise with an error message.
  • At the beginning of the execution, the sysmon configuration file is read and loaded it in memory. The script then execute each test requested in the file and update the result column and finally write back the updated file to disk.
  • Blank lines and lines that begin with a "#" are ignored by the System Monitor
  • Each time the system monitor run, the last line of the configuration file is updated. We can see the System Monitor version number, the server name, the date and finally the time it took to process all the requested tests. Example of the last line:
    #SADMSTAT 2.21 holmes - Sat Aug 11 10:42:08 2018 - Execution Time 6.00 seconds
  • This is a summary of what step the System Monitor go through when it is started :
    1. First it check if the System Monitor lock file exist (${SADMIN}/sysmon.lock).
      If it doesn't, it's created and we proceed with the next step.
      If it already exist, SysMon check when it was created. If it was more than 30 minutes ago it is deleted and recreated. If it was created less than 30 minutes, a warning message is issued and the monitor doesn't start.

    2. Next the SADMIN configuration file is read (Get Company Name and email of sysadmin).
    3. The System Monitor configuration file is loaded in memory
    4. The 'df' command is run and loaded into an array, to be able later what are the new filesystem.
    5. An empty System Monitor Report File is created (${SADMIN}/dat/rpt/{HOSTNAME}.rpt).
    6. The 'df' array is scan and every filesystem not already in the actual System Monitor configuration file are added to it.
    7. Now each line of SysMon configuration are tested and updated in memory. If an error or a warning if found it's written to the Report File (HOSTNAME.rpt).
    8. Last step is the unload SysMon array into the new configuration file and the lock file is removed

    Example of output when running System Monitor
        # $SADMIN/bin/sadm_sysmon.pl
    
        Creating lock file /sadmin/sysmon.lock
        Loading SADMIN configuration file /sadmin/cfg/sadmin.cfg
        ---------------------------------------------------------------------------
        SADMIN SYStem MONitor Tools - Version 2.21
        ---------------------------------------------------------------------------
        O/S Name                 = linux
        Debugging Level          = 5        
        SADM_BASE_DIR            = /sadmin
        Hostname                 = holmes
        Virtual Server           = N
        CMD_SSH                  = /bin/ssh
        ---------------------------------------------------------------------------
    
        Loading SysMon configuration file /sadmin/cfg/holmes.smon
        File /sadmin/cfg/holmes.smon loaded in sysmon_array (262 lines loaded)
    
        Checking for new filesystems ...
        - New filesystem Found - /wiki
        1 new filesystem(s) monitored
    
        Execution of script /sadmin/usr/mon/swatch_nmon.sh is requested
        Filename: swatch_nmon - Extension: .sh
        Running script /sadmin/usr/mon/swatch_nmon.sh ... 
        Return code is 0
    
        Checking CPU Load Average ...
        Uptime line:  09:30:02 up 16:58,  1 user,  load average: 0.15, 0.25, 0.27
        Load Average is at 0 - W: 20 E: 35
    
        Checking CPU Usage ...
        CPU Usage line:   0  0      0 395728   9836 4697304    0    0     0     0  861 1053  4  0 96  0  0
        CPU User:   4 - System:   0  - Total:   4 - Warning Level: 85 - Error Level: 95
    
        Checking Swap Space ...
        Swap Info Line: Swap:       3145724           0     3145724
        Swap size: 3145724 - Usage: 0 - Percentage use: 0 %
    
        Checking service crond,cron
        - systemctl status crond.service ... [RUNNING]
        [OK] Service is running - Total returned (1)
    
        Checking service chronyd,ntp,ntpd
        - systemctl status chronyd.service ... [RUNNING]
        [OK] Service is running - Total returned (1)
    
        Checking service ssh,sshd
        - systemctl status ssh.service ... [NOT RUNNING]
        - systemctl status sshd.service ... [RUNNING]
        [OK] Service is running - Total returned (1)
    
        Checking service postfix,sendmail
        - systemctl status postfix.service ... [RUNNING]
        [OK] Service is running - Total returned (1)
    
        Checking Multipath ...
        Multipath status is not in use - Code = (1) (1=ok 0=Error)
    
        Checking response from http://www.linternux.com ... 
        [OK] Web site is responding
    
        Checking response from http://www.sadmin.ca ... 
        [OK] Web site is responding
    
        Test ping to www.google.com ...  OK (0)
        Test ping to www.ibm.com ...  OK (0)
    
        [OK] Filesystem / at 37% ... Warning: 85 - Error: 90
        [WARNING] Filesystem /usr at 84% ... Warning: 80 - Error: 85
        No Script specified for execution in hostname.smon
        No Filesystem increase will happen
        [OK] Filesystem /boot at 52% ... Warning: 85 - Error: 90
        
        [WARNING] Filesystem /coco at 23% ... Warning: 20 - Error: 40
        Filesystem Increase: /coco at 23%
        Actual Date and Time   : 2018 08 12 09 52 08 - 1534081928
        Last increase attempt  : 2018 08 12 09 52 00 - 1534081920
        So 8 seconds since last increase
        Filesystem increase counter: 001 
        Filesystem /coco selected for increase
        Name of script is  ../sadmin/bin/sadm_fs_incr.sh..
      - Command executed: /sadmin/bin/sadm_fs_incr.sh /coco >>/sadmin/log/sadm_fs_incr.sh.log 2>&1
      - [OK] Return Code: 0
    
        [OK] Filesystem /storix at 8% ... Warning: 85 - Error: 90
        [OK] Filesystem /backups at 44% ... Warning: 85 - Error: 90
        [OK] Filesystem /wiki at 5% ... Warning: 85 - Error: 90
        [OK] Filesystem /linternux at 3% ... Warning: 85 - Error: 90
        [OK] Filesystem /tmp at 2% ... Warning: 85 - Error: 90
        [OK] Filesystem /opt at 59% ... Warning: 85 - Error: 90
        [OK] Filesystem /sadmin at 67% ... Warning: 85 - Error: 90
        [OK] Filesystem /sysadmin at 9% ... Warning: 85 - Error: 90
        [OK] Filesystem /coco at 23% ... Warning: 85 - Error: 90
        [OK] Filesystem /wsadmin at 14% ... Warning: 85 - Error: 90
        [OK] Filesystem /mystuff at 8% ... Warning: 85 - Error: 90
        [OK] Filesystem /gitrepos at 7% ... Warning: 85 - Error: 90
        [OK] Filesystem /install at 63% ... Warning: 85 - Error: 90
        [OK] Filesystem /home at 35% ... Warning: 85 - Error: 90
        [OK] Filesystem /psadmin at 4% ... Warning: 85 - Error: 90
        [OK] Filesystem /var at 40% ... Warning: 85 - Error: 90
        -----
        Updating SADM Sysmon configuration file (/sadmin/cfg/holmes.smon)
        Deleting SYStem MONitor lock file /sadmin/sysmon.lock
        #SYSMON 2.21 holmes, Sun Aug 12 09:30:09 2018, Execution Time 7.00 seconds
        #
        


    Example of Report File generated by SysMon
    The SysMon Report File, contain the information that will be transmitted to the $SADMIN server, so that the proper alert is initiated to the proper group or individual. As of this writing, a mail is sent to email address specified in the SADMIN configuration file. Soon more alert mechanism will be added. The content of all the RPT files can be seen on the Web Interface on the "SysMon Alerts" page or by running the Sysmon Terminal User Interface "sadm_sysmon_tui.pl"

        # cat /sadmin/dat/rpt/holmes.rpt 
        Warning;holmes;2018.08.12;10:02;linux;FILESYSTEM;Filesystem /usr at 84% > 80%;mail;sadmin
        #           
        

    The Report File is text file and each field are delimited by a ";".
    This is the definition of every fields ;
    1. "Warning", if the test reach or above the warning level specified in the SysMon configuration file.
      "Error", if the value return by the test reach or is above the error level in SysMon config file.
      "Running", indicate that the script is running.
    2. Contain the hostname where the information come from.
    3. Date the event occurred (YYYY.MM.DD).
    4. Time the event occurred (HH:MM).
    5. The Module is which the event occurred (O/S, FileSystem, Script, Service, ...).
    6. The Sub-Module that is more specific is describing the event type.
    7. The description of the Error, Warning or process that trigger the event.
    8. The alert type use, for now only mail is the only alerting mechanism. But we are working on other type like "Slack".
    9. This field specify the group that the alert must be sent.
      For the moment only one group is possible and it is "sadmin", which means that the email will be sent to the sysadmin specified in the sadmin configuration file.
  • Every five minutes the SADMIN server fetch every active servers and rsync a copy the Report File to the SADMIN server (Into $SADMIN/www/dat/${HOSTNAME)/rpt) where it is ready to be process.
  • For a description about the format and the way that each line is process, please read the documentation page of the SysMon configuration file


REQUIREMENTS
 
EXIT STATUS
[0]    An exit status of zero indicates success
[1]    Failure is indicated by a nonzero value, typically ‘1’.

 
AUTHOR
Jacques Duplessis (jacques.duplessis@sadmin.ca.).
Any suggestions or bug report can be sent at http://www.sadmin.ca/support.php

 
COPYRIGHT
Copyright © 2018 Free Software Foundation, Inc. License GPLv3+:
    - GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software, you are free to change and redistribute it.
There is NO WARRANTY to the extent permitted by law.

 
SEE ALSO
sadm_sysmon_tui.sh   (System Monitor Terminal UI)
smon (link to sadm_sysmon_cli.sh)   (Run System Monitor and show results)


 
INDEX
NAME
SYNOPSIS
DESCRIPTION
OPTIONS
REQUIREMENTS
EXIT STATUS
AUTHOR
COPYRIGHT
SEE ALSO

Copyright © 2015-2019 - www.sadmin.ca - Suggestions, Questions or Report a problem at support@sadmin.ca