Alerts

Having dashboards is not enough; we cannot expect developers to constantly monitor dashboards 24/7. One needs real-time alerting. This means the ability to set thresholds on metrics and the identification of critical logs/events. As part of the alert setup, we also need to set up what is considered the communication mechanism for the alert. This mechanism can vary from a simple email to sophisticated solutions such as PagerDuty.

Breaching of these thresholds could lead to an outage, cause a spike in latency, or somehow affect customer experience, and hence a notification needs to go out to the relevant teams to set right the situation. Importantly, the thresholds should be set so that the notification goes out before a catastrophic situation occurs. There should be sufficient time for the team to debug and help correct the situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset