官术网_书友最值得收藏!

The first steps in monitoring

Situations similar to the one just described are actually more common than desired. A system fault that had no symptoms visible before is relatively rare. A subsection of UNIX Administration Horror Stories (http://www-uxsup.csx.cam.ac.uk/misc/horror.txt) that only contains stories about faults that weren't noticed in time could probably be compiled easily.

As experience shows, problems tend to happen when we are least equipped to solve them. To work with them on our terms, we turn to a class of software commonly referred to as network monitoring software. Such software usually allows us to constantly monitor things happening in a computer network using one or more methods and notify the persons responsible, if a metric passes a defined threshold.

One of the first monitoring solutions most administrators implement is a simple shell script invoked from a crontab that checks some basic parameters such as disk usage or some service state, such as an Apache server. As the server and monitored-parameter count grows, a neat and clean script system starts to grow into a performance-hogging script hairball that costs more time in upkeep than it saves. While the do-it-yourself crowd claims that nobody needs dedicated software for most tasks (monitoring included), most administrators will disagree as soon as they have to add switches, UPSes, routers, IP cameras, and a myriad of other devices to the swarm of monitored objects.

So, what basic functionality can one expect from a monitoring solution? Let's take a look:

  • Data gathering: This is where everything starts. Usually, data is gathered using various methods, including Simple Network Management Protocol (SNMP), agents, and Intelligent Platform Management Interface (IPMI).
  • Alerting: Gathered data can be compared to thresholds and alerts sent out when required using different channels, such as e-mail or SMS.
  • Data storage: Once we have gathered the data, it doesn't make sense to throw it away, so we will often want to store it for later analysis.
  • Visualization: Humans are better at distinguishing visualized data than raw numbers, especially when there's a lot of data. As we have data already gathered and stored, it is easy to generate simple graphs from it.

Sounds simple? That's because it is. But then we start to want more features, such as easy and efficient configuration, escalations, and permission delegation. If we sit down and start listing the things we want to keep an eye out for, it may turn out that that area of interest extends beyond the network, for example, a hard drive that has Self-Monitoring, Analysis, and Reporting Technology (SMART) errors logged, an application that has too many threads, or a UPS that has one phase overloaded. It is much easier to manage the monitoring of all these different problem categories from a single configuration point.

In the quest for a manageable monitoring system, wondrous adventurers stumbled upon collections of scripts much like the way they themselves implemented obscure and not-so-obscure workstation-level software and heavy, expensive monitoring systems from big vendors.

Many went with a different category—free software. We will look at a free software monitoring solution, Zabbix.

主站蜘蛛池模板: 依兰县| 九龙坡区| 龙州县| 聊城市| 景宁| 望奎县| 太原市| 宝山区| 南投县| 山阳县| 杭锦旗| 库尔勒市| 扶风县| 通道| 霞浦县| 金沙县| 伊宁市| 福海县| 交城县| 静宁县| 岳阳市| 寿光市| 郎溪县| 灯塔市| 清原| 军事| 绵竹市| 禹州市| 阿拉尔市| 桃园市| 综艺| 延吉市| 滨州市| 阜城县| 赤城县| 九龙城区| 新津县| 花莲县| 博野县| 彝良县| 湘潭市|