官术网_书友最值得收藏!

Counters, gauges, timers, and more

The most famous library is probably Metrics from Dropwizard (http://metrics.dropwizard.io) but all libraries share more or less the same sort of API. The metrics are centered around a few important concepts:

  • Gauges: These provide the measure of a value at a certain time. They are intended to build a time series. Most famous examples are the CPU or memory usages.
  • Counters: These are long values, often associated with a gauges in order to build time series.
  • Histogram: This structure allows you to compute the statistics around a value, for instance, the mean or the percentiles of request lengths.
  • Timers: These are a bit like histograms; they compute other metrics based on one metric. Here, the goal is to have information about the rate of a value.
  • Health checks: These are less related to the performance; they allows you to validate that a resource (such as a database) is working or not. Health checks throw a warning/error if the resource isn't working.

All these libraries provide different ways to export/expose the collected data. Common configurations are related to JMX (through MBeans), Graphite, Elasticsearch, and so on, or just the console/logger as the output.

How can these concepts be linked to the performance? The most important features for us will be the gauges and the counters. The gauges will enable us to make sure the server is doing well (for example, the CPU is not always at 100%, the memory is well released, and so on). The counters will enable us to measure the execution time. They will also enable us to export the data in an aggregated storage if you test against multiple instances, allowing you to detect some potential side effects of one instance on another one (if you have any clustering for example).

Concretely, we want to measure some important segments of our code. In the extreme case, if you don't know anything about the application, you will likely want to measure all parts of the code then refine it when you have more knowledge about your application.

To be very concrete and illustrate what we are trying to achieve, we want to replace application methods by this kind of pattern:

@GET
@Path("{id}")
public JsonQuote findById(@PathParam("id") final long id) {
final Timer.Context metricsTimer = getMonitoringTimer("findById").time();
try {
return defaultImpl();
} finally {
metricsTimer.stop();
}
}

In other words, we want to surround our business code with a timer to collect statistics about our execution time. One common and poor man solution you can be tempted to start with is to use loggers to do it. It often looks as follows:

@GET
@Path("{id}")
public JsonQuote findById(@PathParam("id") final long id) {
final long start = System.nanoTime();
try {
return defaultImpl();
} finally {
final long end = System.nanoTime();
MONITORING_LOGGER.info("perf(findById) = " +
TimeUnit.NANOSECONDS.toMillis(end - start) + "ms");
}
}

The preceding code manually measures the execution time of the method and, then, dumps the result with a description text in a specific logger to identify the code portion it is related to.

In doing so, the issue you will encounter is that you will not get any statistics about what you measure and will need to preprocess all the data you collect, delaying the use of the metrics to identify the hotspots of your application and work on them. This may not seem like a big issue, but as you are likely to do it many times during a benchmark phase, you will not want to do it manually.

Then, the other issues are related to the fact that you need to add this sort of code in all the methods you want to measure. Thus, you will pollute your code with monitoring code, which is rarely worth it. It impacts even more if you add it temporarily to get metrics and remove it later. This means that you will try to avoid this kind of work as much as possible.

The final issue is that you can miss the server or library (dependency) data, as you don't own this code. That means that you may spend hours and hours working on a code block that is, in fact, not the slowest one.

主站蜘蛛池模板: 繁昌县| 上杭县| 兴海县| 彭山县| 石门县| 余江县| 乐安县| 睢宁县| 上犹县| 平顺县| 永登县| 青神县| 曲周县| 海淀区| 湄潭县| 邻水| 长海县| 大埔区| 新泰市| 沧源| 余庆县| 施甸县| 梁山县| 略阳县| 巨野县| 天水市| 砀山县| 太原市| 民勤县| 温宿县| 钟祥市| 玉溪市| 安阳县| 神池县| 沂南县| 双牌县| 遵化市| 开远市| 武冈市| 嘉祥县| 大埔县|