SSL Cache Limits - Downgraded Performance

Incident Report for Spotlightr

Resolved

Major incident involving all services and platforms.

Start time 10:00 AM CEST
Resolved: 6:00 PM CEST
Duration: 8 Hours
Impacted services: Video Player, Data Services and Access, API, Connected Platforms, Integrations
Severity: High
Performance Impact: All services experienced either a significant slowdown in availability and performance and or complete denial of service as well as timed out requests.

We experienced a prolonged service interruption due to an issue on our load balancer server. This server is responsible for securely routing traffic to our infrastructure. A misconfiguration prevented secure connections from being reused efficiently, leading to high CPU usage and, eventually, timeouts and failures in serving content.

Resolution involved:
Restarting all services to reinitialize the secure session cache
Configuration limits and other adjustments to support more concurrent connections in the future
Revalidation of SSL session reuse and performance optimizations were working as intended

What we are doing to make sure this doesn't happen again:
We will be upgrading our load balancing infrastructure to a more modern version with better observability and performance control. In addition to that we will also be improving our monitoring to detect this type of issue earlier and review configuration and system resource limits to ensure they align with expected traffic levels.

We thank you for your patience and sincerely apologize for the disruption this caused. Reliability is our top priority, and we’re committed to strengthening our systems to avoid similar issues in the future.

If you have any questions, feel free to reach out to our support team.

Posted May 23, 2025 - 08:00 UTC