Mandatory Availability Monitoring

Annual Report on Russian Services Reliability 2024

Deep analysis of a year's data: which services went down most often, average recovery times, and market trends.

Dashboard showing annual uptime statistics for monitored Russian web services across 2024

Article Content

A Year Under the Microscope

Throughout 2024, VantaStatus continuously monitored 340 public-facing services operating within the Russian Federation — from banking gateways and e-commerce platforms to government portals and cloud providers. This report aggregates 28.4 million individual health-check pings collected at 60-second intervals from 12 geographically distributed probe nodes.

99.17%

Weighted average availability across all 340 monitored services for the full calendar year 2024.

14 min

Mean time to recovery (MTTR) for incidents lasting longer than 5 minutes — a 22% improvement over 2023.

1,247

Total recorded outages of 5+ minutes duration. 38% were caused by DNS resolution failures, 29% by application-layer errors.

Services that achieved the gold-tier benchmark of 99.95%+ uptime. This represents 25.3% of the entire cohort.

Data & Charts

Incident Distribution by Sector

The financial sector led in total incident count, though individual services maintained strong SLAs. Cloud infrastructure providers experienced the highest severity events, with three multi-hour outages affecting downstream customers across multiple regions.

Banking & Fintech

312 incidents

Sberbank Online, Tinkoff, and Alfa-Bank were the most frequently disrupted. Peak incident density occurred during the March 15–17 maintenance window when 14 services reported simultaneous degradation. Average downtime per event: 8 minutes.

Cloud & Hosting

187 incidents

Yandex.Cloud recorded the longest single outage of the year — 4 hours and 12 minutes on June 22, triggered by a BGP routing anomaly in the Moscow exchange. Selectel and Timeweb followed with 61 and 44 incidents respectively. MTTR for cloud providers: 23 minutes.

E-Commerce

245 incidents

Wildberries and Ozon together accounted for 138 incidents, largely concentrated during the 11.11 and Black Friday sales events. Wildberries' November 11 peak saw response times exceeding 12 seconds for 3.5 hours. Average downtime per event: 11 minutes.

Government Portals

94 incidents

Gosuslugi maintained 99.61% availability, its best annual figure since monitoring began. The largest disruption — 1 hour 48 minutes on May 9 — was attributed to planned load-testing that exceeded capacity thresholds. Regional portals averaged 98.44%.

Messaging & Social

156 incidents

VKontakte experienced 47 incidents, with a notable 2-hour outage on August 3 caused by a certificate renewal failure. Telegram's web client (web.telegram.org) hit 99.78% — a 0.4% improvement. MyWorld (formerly Mail.ru) recorded 33 incidents.

Streaming & Media

121 incidents

KinoPoisk and Okko led this category with 28 and 22 incidents. Peak load during the December 31 New Year programming window caused 19 services to experience elevated error rates simultaneously. Average downtime per event: 6 minutes.

Bar chart comparing incident frequency across six service sectors in the 2024 annual reliability report

Conclusions

Key Findings & Outlook

The 2024 data reveals a maturing infrastructure landscape. While total incident counts remain elevated due to the expanding service ecosystem, recovery speeds and overall availability metrics improved across nearly every category compared to the previous year.

DNS-related failures decreased by 18% year-over-year, suggesting that operators are investing in redundant resolver architectures and anycast deployments. The most significant improvement came from the cloud sector, where MTTR dropped from 34 minutes in 2023 to 23 minutes in 2024, driven by automated failover capabilities introduced by Yandex.Cloud and Selectel in Q2.

E-commerce platforms continue to be the most volatile segment, with sales-event-driven outages accounting for 41% of all incidents in this category. We recommend operators implement graduated load-shedding strategies and pre-warm CDN caches at least 48 hours before known high-traffic events.

Government portals showed remarkable consistency, with Gosuslugi's 99.61% uptime setting a new baseline for public-sector services. Regional portals, however, still lag at 98.44% — a gap we expect to narrow as federal infrastructure-sharing programs expand in 2025.

Looking ahead, the primary risk factors for 2025 remain peak-load capacity planning, DNS resilience, and cross-provider dependency chains. Services that achieved 99.95%+ uptime in 2024 shared three common traits: multi-region active-active deployments, automated incident response playbooks, and dedicated on-call engineering teams operating 24/7.

Top Performer

Gosuslugi — 99.61%

Led all government services and ranked third overall among 340 monitored endpoints. Only 94 minor incidents, all under 15 minutes.

Most Improved

Yandex.Cloud — +0.31%

Availability rose from 99.12% to 99.43%. The June 22 outage remains the sole four-hour event in an otherwise stable year.

Highest Risk

Wildberries — 97.82%

Recorded 78 incidents including three events exceeding 2 hours. Sales-event preparedness remains the critical gap.