Alerts & Incidents
Alerts & Incidents
Configure alert channels, rules, and manage incidents with tracer's alerting system.
Alerts & Incidents
Get notified when things break. tracer's alerting system lets you configure who gets notified, when, and how.
Overview
The alerting system consists of:
- Alert Channels - Where notifications are sent (Slack, Email, Webhooks)
- Alert Rules - When to trigger alerts (failures, latency, visual diff)
- Incidents - Track and resolve ongoing issues
Quick Setup
- Create a channel - Add your Slack workspace, email, or webhook
- Create a rule - Define when to alert
- Link to monitors - Apply rules to your monitors
Alert Channels
Configure Slack, Email, Webhooks, and Teams notifications
Alert Rules
Define conditions for triggering alerts
How Alerting Works
Monitor Run → Check Rules → Trigger Alert → Send to Channels → Create Incident
↓
Recovery → Send Recovery → Close IncidentAlert Flow
- Monitor runs and collects results
- Rules evaluate against results (e.g., 3 consecutive failures)
- Alert triggers when rule conditions are met
- Notifications sent to configured channels
- Incident created to track the issue
- Recovery detected when monitor succeeds again
- Recovery notification sent
- Incident closed automatically
Deduplication
tracer prevents alert fatigue with intelligent deduplication:
- Multiple failures don't send multiple alerts
- Only one alert per incident
- Recovery notification sent once when resolved
Alert Types
| Type | Description | Example |
|---|---|---|
| Consecutive Failures | Alert after N failures in a row | Alert after 3 failed runs |
| Error Rate | Alert when failure % exceeds threshold | Alert if >10% fail in 1 hour |
| Latency | Alert when response time exceeds limit | Alert if response >2 seconds |
| Visual Diff | Alert on visual changes | Alert if UI diff >5% |
| SSL Expiry | Alert before certificate expires | Alert 30 days before expiry |
Incident Management
When an alert triggers, an incident is created:
- Status: Open → Acknowledged → Resolved
- Timeline: All related events logged
- Duration: Time from trigger to resolution
- Affected: Which monitors/journeys
Incident Lifecycle
┌─────────────────────────────────────────────────────────┐
│ Alert │
│ Triggered → Incident → Acknowledged → Resolved │
│ Created (Optional) Closed │
└─────────────────────────────────────────────────────────┘Best Practices
- Start simple - Begin with consecutive failure alerts
- Avoid alert fatigue - Set reasonable thresholds
- Use escalation - Email → Slack → PagerDuty
- Acknowledge alerts - Let the team know someone's looking
- Review regularly - Tune thresholds based on experience