- Learning NAGIOS 3.0
- Wojciech Kocjan
- 415字
- 2021-08-25 18:05:38
S oft and Hard States
Nagios works by checking if a particular host or service is working correctly and storing its status. Because the status of a service is only one of the four possible values, it is crucial that it actually reflects what the current status is. In order to avoid detecting random and temporary problems, Nagios uses soft and hard states to describe what the current status of a host or service is.
Imagine that an administrator is restarting a web server and this operation makes connection to the web pages unavailable for five seconds. As, usually, such restarts are done at night to lower the number of users affected, this is an acceptable period of time. However, a problem might arise when Nagios tries to connect to the server and notices that it is actually down. If it relies only on a single result, Nagios would trigger an alert that a web server is down. It would actually be up and running again in a few seconds, but it could take a couple of minutes for Nagios to find that out.
To handle situations when a service is down for a very short time, or the test has temporarily failed, soft states were introduced. When the status of a check is unknown, or it is different from the previous one, Nagios will retest the host or service several times to make sure that the change is persistent. The number of checks is specified in the host or service configuration. Nagios assumes that the new result is a soft state. After additional tests have verified that the new state is permanent, it is considered a hard state.
Each host and service definition specifies the number of retries to be performed before it can be assumed that a change is permanent. This allows more flexibility over how many failures should be treated as an actual problem instead of a temporary one. Setting the number of checks to one will cause all changes to be treated as hard instantly. The following is an illustration of soft and hard state changes, assuming that number of checks to be performed is set to three:

This feature allows ignoring short outages of a service. It is also very useful for performing checks that can periodically fail even if everything is working correctly. Monitoring devices over SNMP is also an example where a single check might fail, but the check will eventually succeed during the second or third check.
- 中文版SketchUp 2022完全實戰技術手冊
- VR新未來
- HTML5 Multimedia Development Cookbook
- Spring Python 1.1
- AutoCAD Civil 3D 2018 場地設計實例教程
- jQuery Mobile First Look
- 攝影照片修飾完全自學手冊
- 中文版Maya 2014案例教程
- Quickstart Apache Axis2
- 圖像處理中的數學修煉(第2版)
- ASP.NET 3.5 Application Architecture and Design
- 中文版Photoshop CC平面設計從入門到精通(唯美)
- 快學熟用D3
- 中文版Flash CS6動畫制作(慕課版)
- UG NX 12.0中文版從入門到精通