While monitoring a cluster in the cell, the administrator notices that one server in the cluster periodically loses connections to the database. When this happens, requests to the server have a significantly decreased response time and various error conditions are listed in the log files for the server. Since the error codes are returned quickly, the server starts returning responses faster than the average service times for the application. Due to this, the weight for the server is increased and a large percentage of incoming requests are being routed to the erroneous server and the server is getting overloaded with requests.
How can the administrator detect these conditions in the future and take action to prevent this problem?
Storm drain condition tracks requests that have a significantly decreased response time. This policy relies on change point detection on given time series data.
Elin
5 months agoLynette
6 months agoAllene
6 months agoRolland
6 months ago