Back to overview

Complete API Service Outage

Jan 14 at 06:23pm EST
Affected services
minerva.io
Enrich
LinkedIn Contact Data

Resolved
Jan 14 at 06:23pm EST

Status: Resolved
Duration: 2 hours, 8 minutes
Date: January 13, 2026
Affected Services: All API endpoints (Enrich, Resolve, LinkedIn Contact Data), OAuth 2.0 Authentication
Impact: Complete service unavailability for all customers

TIMELINE (All times EST)

Jan 13, 2026 - 1:12 PM - Investigating
Our monitoring systems detected widespread failures across API endpoints. Engineering team notified and investigating.
Jan 13, 2026 - 1:45 PM - Identified
Root cause identified: AWS Lambda account-level concurrency exhaustion (1,000 concurrent executions limit) caused by high-volume customer operation. This triggered cascading database connection pool exhaustion, resulting in complete API unavailability.
Jan 13, 2026 - 2:55 PM - Monitoring
Applied reserved concurrency limits to critical authentication services. Monitoring for stability and additional throttling.
Jan 13, 2026 - 3:40 PM - Resolved
All services restored to normal operation. API response times returned to baseline. No data loss or corruption occurred.

WHAT HAPPENED

A large-scale data operation from a single customer generated approximately 83,000 API requests within one hour, exhausting our AWS Lambda account-wide concurrency limit. This caused Lambda function throttling across all customer-facing services, database connection pool exhaustion (approximately 4,000 simultaneous connections), and complete unavailability of authentication and API endpoints.
Data Security: No customer data was compromised, exposed, or modified. This was strictly an availability incident.

RESOLUTION

Applied reserved concurrency allocation to critical authentication services. Identified and throttled the root cause Lambda function. Restored database connection pool to normal levels (47 active, 169 idle connections). Validated all services operational before marking resolved.

PREVENTIVE MEASURES

Immediate (Completed): Reserved concurrency for all critical Lambda functions. Enhanced monitoring with automated on-call escalation.
This Week: Per-customer rate limiting at API Gateway layer. Deprecation of legacy routing component that was the root cause. Customer documentation on API batching best practices.
Next 30 Days: Load testing to validate 200k+ requests/hour capacity. Bulk operations coordination policy for high-volume integrations. Database connection pool optimization.