From f8d0b344ae1b2dc3894c1a597c0565911b762742 Mon Sep 17 00:00:00 2001 From: John Klug Date: Tue, 19 Jan 2021 17:21:32 -0600 Subject: softdog-mon for monitoring a system using kernel module "softdog" --- README | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 README (limited to 'README') diff --git a/README b/README new file mode 100644 index 0000000..6d0e22f --- /dev/null +++ b/README @@ -0,0 +1,84 @@ +Configurable monitor program that uses softdog. + +This program (softdog-mon) must be running all +the time once it is started, or the +system will reset itself without a shutdown. +The problems detected may prevent shutdown, so a +reset is safer. + +The SHUTDOWNTIMEOUT value is the time to allow for +shutdown. Since firmware updates are done during +shutdown, this should be the worst case time for +shutdown. + +Variables passed through the environment: + +# Monitor program will have 1 second granularity. Fixed. +# All times are in seconds. + +# Hardware watchdog is found first, which is watchdog0. +WATCHDOG = /dev/watchdog1 + +# Nice value -20, is highest priority for a user program, 19 is lowest. +NICE = -20 + +# Watchdog timeout in seconds +TIMEOUT = 60 + +# How often to feed in seconds +FEED = 10 + +# File is synchonously open/read/written/closed every 30 seconds +FILESAMPLERATE = 30 + +# File to be read/written +# If I/O hangs, the TIMEOUT value is the maximum seconds until we +# reset the device. +MONITORFILE = /media/card/.softdog_monitor + +# Minimum available system memory in bytes +MINIMUM_AVAILABLE_MEM = 3000000 + +# Minimum free high memory +MINIMUM_FREEHIGH = 0 + +# Rate at which we sample available memory +MEMSAMPLERATE = 3 + +# last samples saved +MEMSAMPLES = 100 + +# maximum number of samples failed in last samples saved +MEMFAILEDSAMPLES = 20 + +# Allow time for flash upgrade during shutdown +# This happens when a SIGTERM signal is received. +# So shutdown has this many seconds to complete. +SHUTDOWNTIMEOUT=600 + +Their is an additional test program called +hog. This can be used to acquire memory and kernel +resources. + +hog 4750000 + +This will start five processes with 4750000 bytes +of memory. The idea is to trigger the watchdog. + + hog 4750000 + +Creates five processes with the amount of memory +specified. + +In a typical test: + +Log into the device several times with ssh, and do sudo -s +and acquire a root shell. + +As root start the program hog, the amount of memory required +will depend on the size of the programs typically running. + +Start top on the several screens logged in. Try to get the +available memory below 3MB. Once 20 samples have failed +the device will reboot. + -- cgit v1.2.3