Wayfair needed to efficiently monitor performance across their systems, which are spread across three major data centers. The data is used by their developers, business stakeholders, and internal alerting engine. Their 24/7 Ops Monitoring Center is using this data to constantly analyze the vital signs of Wayfair’s IT infrastructure and storefront operations. Rapid growth led the company to rethink its time series infrastructure. Their existing Graphite solution failed to scale with growth demands in terms of ingest rate, storage, high availability, and it required major engineering time investment to maintain core functionality.
Wayfair chose InfluxData to monitor system metrics & events across data centers, and to perform real user monitoring (RUM) to understand user experience on their e-commerce site. The goal is to marry these with business process events and provide better business insight and competitive advantage. Wayfair uses Kafka MirrorMaker to replicate the data to all three locations and has three six-node InfluxDB Enterprise clusters dedicated to different workloads.
Additional resources
24x7 infrastructure monitoring
Needed a better disaster recovery system for their data centers and 2,000 VMs running hundreds of apps
50 million +
Number of daily real-user monitoring measurements collected across eight stores
Reduced downtime
Strengthened infrastructure to handle 3x-5x typical traffic volume during Cyber 5 weekends
“As Wayfair has grown and matured its software development and data center operations over the past decade, and particularly over the last five years, we have embraced the principle of providing maximum visibility into our processes and systems.”