Alerts & Incidents

Alerts & Incidents

Configure alert channels, rules, and manage incidents with tracer's alerting system.

Alerts & Incidents

Get notified when things break. tracer's alerting system lets you configure who gets notified, when, and how.

Overview

The alerting system consists of:

  1. Alert Channels - Where notifications are sent (Slack, Email, Webhooks)
  2. Alert Rules - When to trigger alerts (failures, latency, visual diff)
  3. Incidents - Track and resolve ongoing issues

Quick Setup

  1. Create a channel - Add your Slack workspace, email, or webhook
  2. Create a rule - Define when to alert
  3. Link to monitors - Apply rules to your monitors

How Alerting Works

Monitor Run → Check Rules → Trigger Alert → Send to Channels → Create Incident

            Recovery → Send Recovery → Close Incident

Alert Flow

  1. Monitor runs and collects results
  2. Rules evaluate against results (e.g., 3 consecutive failures)
  3. Alert triggers when rule conditions are met
  4. Notifications sent to configured channels
  5. Incident created to track the issue
  6. Recovery detected when monitor succeeds again
  7. Recovery notification sent
  8. Incident closed automatically

Deduplication

tracer prevents alert fatigue with intelligent deduplication:

  • Multiple failures don't send multiple alerts
  • Only one alert per incident
  • Recovery notification sent once when resolved

Alert Types

TypeDescriptionExample
Consecutive FailuresAlert after N failures in a rowAlert after 3 failed runs
Error RateAlert when failure % exceeds thresholdAlert if >10% fail in 1 hour
LatencyAlert when response time exceeds limitAlert if response >2 seconds
Visual DiffAlert on visual changesAlert if UI diff >5%
SSL ExpiryAlert before certificate expiresAlert 30 days before expiry

Incident Management

When an alert triggers, an incident is created:

  • Status: Open → Acknowledged → Resolved
  • Timeline: All related events logged
  • Duration: Time from trigger to resolution
  • Affected: Which monitors/journeys

Incident Lifecycle

┌─────────────────────────────────────────────────────────┐
│  Alert                                                  │
│  Triggered  →  Incident  →  Acknowledged  →  Resolved  │
│              Created      (Optional)         Closed    │
└─────────────────────────────────────────────────────────┘

Best Practices

  1. Start simple - Begin with consecutive failure alerts
  2. Avoid alert fatigue - Set reasonable thresholds
  3. Use escalation - Email → Slack → PagerDuty
  4. Acknowledge alerts - Let the team know someone's looking
  5. Review regularly - Tune thresholds based on experience