- Practical Site Reliability Engineering
- Pethuru Raj Chelliah Shreyash Naithani Shailender Singh
- 106字
- 2021-06-10 19:08:11
Platform metrics
This will give you insights into an applications' infrastructure, such as what the average execution time for the top databases queries was, or the top DTU/CPU consuming queries, or resource consumption by application, or average response time for each service endpoint, or each services success/failure ratio. We should set up some alerts on these metrics with high priority, as this could directly impact the user experience. We need to catch these issues/outage before customer by proactive approach. For example, we can set up some automation that will auto-scale our system resource during peak hours. This monitoring will help us understand the platform's performance.