Browse Source
Two related fixes for a stuck-open 'Monitoring' entry in the incident header: 1. serviceIsOk(GENERAL) is now called when a monitoring cycle completes successfully. Previously GENERAL could only accumulate failures (via the outer Throwable catch), with no complementary recovery, so once the catch-all fired the service stayed red forever. 2. checkEdqs() is now wrapped in its own try/catch that reports any non-ServiceFailureException failures under EDQS rather than GENERAL. Connection/read timeouts hitting /api/entitiesQuery/find previously propagated unwrapped and were bucketed as GENERAL, which hid the fact that EDQS was the failing component.pull/15456/head
1 changed files with 13 additions and 4 deletions
Loading…
Reference in new issue