Uptime for a telecommunications company is the number one priority, which makes monitoring a strategic initiative for RingCentral. To keep in step with this initiative, they needed to build a scalable monitoring solution to keep pace with their business and infrastructure growth. This proved more challenging than they had realized since they had outgrown the Zabbix monitoring tool set capacity and needed to replace it with a solution that provides high availability and metrics granularity. Additionally, RingCentral established a goal to streamline their processes to more effectively manage development, configuration alterations, as well as metrics and events collection of their ever-growing application environment, currently about 400+ different “homemade” components continuously developed by a team of 1,500 developers.
RingCentral chose to migrate to open source InfluxDB Stack. After the evaluation phase, they deployed InfluxDB to handle their metrics and event volume growth, Telegraf as the agent installed in every host (physical or virtual) to collect monitoring data, a Kapacitor pool for no downtime (so no trigger event would pass unnoticed), and an in-house built Kapacitor Manager to manage their pool of Kapacitor instances.
Using InfluxDB and Telegraf, RingCentral’s monitoring solution today supports visibility, integrated configuration and alerting for operations efficiency, and quick DevOps cycles for the four pillars of its product (Cloud PBX, contact center, video and meetings, and team messaging) as well as the functionalities built on top of these pillars (open platform, global presence, analytics, and user experience).
InfluxDays Presentation
In this talk, Yuri Ardulov, Principal System Architect at RingCentral will share how to use Kapacitor with the Kapacitor Manager that they built at RingCentral.
10,000 hosts
Major North America presence with 2 data centers (both bare-metal and virtual)
2.5 million metrics & 700,000 triggers
Metrics collected & triggers defined
16,000 metrics every 10 seconds
The need for scale is evident a single app generates a large number of metrics
Technologies Used
“One of the requirements for us was no single point of failure. We can’t afford to have something go wrong with InfluxDB because the metrics stored in there are too important to help keep our service uptime commitments.”