Configurable monitor program that uses softdog. This program (softdog-mon) must be running all the time once it is started, or the system will reset itself without a shutdown. The problems detected may prevent shutdown, so a reset is safer. The SHUTDOWNTIMEOUT value is the time to allow for shutdown. Since firmware updates are done during shutdown, this should be the worst case time for shutdown. Variables passed through the environment: # Monitor program will have 1 second granularity. Fixed. # All times are in seconds. # Hardware watchdog is found first, which is watchdog0. WATCHDOG = /dev/watchdog1 # Nice value -20, is highest priority for a user program, 19 is lowest. NICE = -20 # Watchdog timeout in seconds TIMEOUT = 60 # How often to feed in seconds FEED = 10 # File is synchonously open/read/written/closed every 30 seconds FILESAMPLERATE = 30 # File to be read/written # If I/O hangs, the TIMEOUT value is the maximum seconds until we # reset the device. MONITORFILE = /media/card/.softdog_monitor # Minimum available system memory in bytes MINIMUM_AVAILABLE_MEM = 3000000 # Minimum free high memory MINIMUM_FREEHIGH = 0 # Rate at which we sample available memory MEMSAMPLERATE = 3 # last samples saved MEMSAMPLES = 100 # maximum number of samples failed in last samples saved MEMFAILEDSAMPLES = 20 # Allow time for flash upgrade during shutdown # This happens when a SIGTERM signal is received. # So shutdown has this many seconds to complete. SHUTDOWNTIMEOUT=600 Their is an additional test program called hog. This can be used to acquire memory and kernel resources. hog 4750000 This will start five processes with 4750000 bytes of memory. The idea is to trigger the watchdog. hog 4750000 Creates five processes with the amount of memory specified. In a typical test: Log into the device several times with ssh, and do sudo -s and acquire a root shell. As root start the program hog, the amount of memory required will depend on the size of the programs typically running. Start top on the several screens logged in. Try to get the available memory below 3MB. Once 20 samples have failed the device will reboot.