systemHealth - look after health of services and other infrastructure¶
This plugin checks services, sensors and devices for error messages and aggregates these. It is triggered
by the systemHealth action. It works together with the /systemHealth.html
template document and the neoclock plugin (which can show that any important exceptional condition currently exists).
The sequence of steps that happens is as follows:
- A plugin such as lan detects that a service is unavailable.
- systemHealth registers this in
environment/systemHealth
. - neoclock detects this and shows a moderately visible warning to the end user.
- The end user uses a web browser to visit
/systemHealth.html
to see what the problem is. - The end user now has two options to make the visible warning go away:
- Fix the problem.
- Use one of the ignore buttons. The error will subsequently still be recorded but it will not trigger the neoclock visible warning for an hour or a day or so.
Each service (device, sensor) has two fields that are important. As an example, for service internet:
service/internet/errorMessage
: If this element exists and is non-empty it means the internet service has an error condition (and that that condition is serious enough that we want to inform the user about it). The message will be copied intoenvironment/systemHealth/messages/internet
.service/internet/ignoreErrorUntil
: If this value (timestamp) is set and is in the past it will be deleted. If it is in the future, theerrorMessage
field will not be copied intoenvironment/systemHealth
, effectively ignoring the error condition for some period of time.
actions¶
The systemHealth action triggers this plugin.