Mandatory Availability Monitoring
Annual Report on Russian Services Reliability 2024
Deep analysis of a year's data: which services went down most often, average recovery times, and market trends.
Article Content
A Year Under the Microscope
Throughout 2024, VantaStatus continuously monitored 340 public-facing services operating within the Russian Federation — from banking gateways and e-commerce platforms to government portals and cloud providers. This report aggregates 28.4 million individual health-check pings collected at 60-second intervals from 12 geographically distributed probe nodes.
99.17%
Weighted average availability across all 340 monitored services for the full calendar year 2024.
14 min
Mean time to recovery (MTTR) for incidents lasting longer than 5 minutes — a 22% improvement over 2023.
1,247
Total recorded outages of 5+ minutes duration. 38% were caused by DNS resolution failures, 29% by application-layer errors.
86
Services that achieved the gold-tier benchmark of 99.95%+ uptime. This represents 25.3% of the entire cohort.
Data & Charts
Incident Distribution by Sector
The financial sector led in total incident count, though individual services maintained strong SLAs. Cloud infrastructure providers experienced the highest severity events, with three multi-hour outages affecting downstream customers across multiple regions.
Banking & Fintech
312 incidents
Sberbank Online, Tinkoff, and Alfa-Bank were the most frequently disrupted. Peak incident density occurred during the March 15–17 maintenance window when 14 services reported simultaneous degradation. Average downtime per event: 8 minutes.
Cloud & Hosting
187 incidents
Yandex.Cloud recorded the longest single outage of the year — 4 hours and 12 minutes on June 22, triggered by a BGP routing anomaly in the Moscow exchange. Selectel and Timeweb followed with 61 and 44 incidents respectively. MTTR for cloud providers: 23 minutes.
E-Commerce
245 incidents
Wildberries and Ozon together accounted for 138 incidents, largely concentrated during the 11.11 and Black Friday sales events. Wildberries' November 11 peak saw response times exceeding 12 seconds for 3.5 hours. Average downtime per event: 11 minutes.
Government Portals
94 incidents
Gosuslugi maintained 99.61% availability, its best annual figure since monitoring began. The largest disruption — 1 hour 48 minutes on May 9 — was attributed to planned load-testing that exceeded capacity thresholds. Regional portals averaged 98.44%.
Messaging & Social
156 incidents
VKontakte experienced 47 incidents, with a notable 2-hour outage on August 3 caused by a certificate renewal failure. Telegram's web client (web.telegram.org) hit 99.78% — a 0.4% improvement. MyWorld (formerly Mail.ru) recorded 33 incidents.
Streaming & Media
121 incidents
KinoPoisk and Okko led this category with 28 and 22 incidents. Peak load during the December 31 New Year programming window caused 19 services to experience elevated error rates simultaneously. Average downtime per event: 6 minutes.
Conclusions
Key Findings & Outlook
The 2024 data reveals a maturing infrastructure landscape. While total incident counts remain elevated due to the expanding service ecosystem, recovery speeds and overall availability metrics improved across nearly every category compared to the previous year.
DNS-related failures decreased by 18% year-over-year, suggesting that operators are investing in redundant resolver architectures and anycast deployments. The most significant improvement came from the cloud sector, where MTTR dropped from 34 minutes in 2023 to 23 minutes in 2024, driven by automated failover capabilities introduced by Yandex.Cloud and Selectel in Q2.
E-commerce platforms continue to be the most volatile segment, with sales-event-driven outages accounting for 41% of all incidents in this category. We recommend operators implement graduated load-shedding strategies and pre-warm CDN caches at least 48 hours before known high-traffic events.
Government portals showed remarkable consistency, with Gosuslugi's 99.61% uptime setting a new baseline for public-sector services. Regional portals, however, still lag at 98.44% — a gap we expect to narrow as federal infrastructure-sharing programs expand in 2025.
Looking ahead, the primary risk factors for 2025 remain peak-load capacity planning, DNS resilience, and cross-provider dependency chains. Services that achieved 99.95%+ uptime in 2024 shared three common traits: multi-region active-active deployments, automated incident response playbooks, and dedicated on-call engineering teams operating 24/7.
Top Performer
Gosuslugi — 99.61%
Led all government services and ranked third overall among 340 monitored endpoints. Only 94 minor incidents, all under 15 minutes.
Most Improved
Yandex.Cloud — +0.31%
Availability rose from 99.12% to 99.43%. The June 22 outage remains the sole four-hour event in an otherwise stable year.
Highest Risk
Wildberries — 97.82%
Recorded 78 incidents including three events exceeding 2 hours. Sales-event preparedness remains the critical gap.