Prometheus Monitoring
A Guide to Prometheus Monitoring - Definition, Use Cases, Applications, and Resources
What is Prometheus?
Prometheus is an open source systems monitoring and alerting toolkit originally built at SoundCloud by ex-Googlers who wanted to monitor metrics on their servers and applications. Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes. Prometheus is an open source offering that is provided independently from any company and is very popular as the monitoring solution for Kubernetes metrics. Prometheus, like InfluxDB, is written in Go.
How does Prometheus Work as a Monitoring Solution?
The Prometheus website provides a great overview for the Prometheus Monitoring solution and the underlying time series infrastructure. Basically to monitor your services using Prometheus, they need to expose a Prometheus endpoint. This endpoint is an HTTP interface that exposes a list of metrics and the current value of the metrics. The Prometheus server then polls the metrics interface on the services and stores the data. This architecture is referred to as polling-based monitoring, or pull-based monitoring.
For Kubernetes environments, the service discovery is well-integrated and Prometheus polls the metrics endpoints and gathers the metrics into the Prometheus Server for monitoring and alerting. A benefit of the pull approach is that it does not require you to install an agent to collect the metrics, although you still need to deploy “exporters” to expose the metrics from the system(s) you are collecting metrics from.
Push vs. Pull Based Metric Collection and Monitoring
In the pull-based method, the monitoring agent polls the targets being monitored periodically and alerts based on that data. In the push method, telemetry and metrics are pushed to the monitoring agent (or more frequently a time series database), and monitoring is done either through the agent or other processes querying the database.
When instrumenting your own application code, you need to choose between the push and pull collection methods. Either you send metrics out to another service via a client library, or you make them available to others through some network addressable target (like an HTTP API, for example).
Prometheus has become the standard language for Kubernetes pull-based metrics. By formalizing the pull method, Prometheus provides a standard language for all kinds of services and applications to expose targets using a standard format to pull metrics data from.
The primary disadvantage of pull-based methods is that they don’t work well for event-driven time series (like individual requests to an API, or events in your infrastructure). Another disadvantage is that all metric endpoints have to be reachable from the server, implying a more elaborate secure network configuration. This can also become an issue for large-scale deployments that require clustering for high availability.
For Kubernetes only, monitoring pull-based metrics collection might be just fine, but for distributed environments, especially in IoT architectures, push-based monitoring is preferable. In most environments, there is usually the need to monitor and alert on both metrics (regular time intervals) and events (irregular time intervals), so it’s preferable to support both push and pull. This is currently a limitation of Prometheus, but this is where InfluxData’s Telegraf and Kapacitor can enhance the Prometheus environment.
Augmenting Prometheus to Support Monitoring Using Push and Pull
Kapacitor can read all the metrics generated following the Prometheus standard. This means that any service discovery target that works with Prometheus will work with Kapacitor. In addition, with InfluxDB’s native support for the Prometheus remote read and write protocol, you can use Prometheus to collect data and InfluxDB as your long-term, highly available, scalable data store.
Using Kapacitor to monitor Prometheus scrape targets allows for further streaming analytics, advanced anomaly detection, or the ability to add custom logic that gets triggered on the streaming data before storing it in the underlying data store.
The Telegraf operator project provides additional options for supplementing or replacing Prometheus monitoring solutions with the InfluxDB platform.
Prometheus Server Architecture
One of the core values and primary design objectives of Prometheus is simplicity. To achieve this, Prometheus focuses on a single-node architecture and enhances the server to achieve increased performance in this single-node infrastructure. Prometheus doesn’t have clustering in their roadmap, probably because of the additional complexity this would create, and this is against their design objective of simplicity.
PromQL
Prometheus Query Language (PromQL) is a functional query language that enables users to select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via HTTP API, for example, as an alert or notification.
Augmenting Prometheus for High Availability
InfluxData’s InfluxDB has a similar single-node approach with some similar design objectives, but InfluxDB Enterprise includes clustering to support environments that require high availability. Because InfluxDB Enterprise includes Kapacitor and Telegraf, you can maintain any investment in building Prometheus end-points, but you can store data on multiple nodes in a clustered InfluxDB Enterprise deployment. InfluxDB’s native scripting and querying language, Flux, supports PromQL to further reduce the headache of having to write queries to both data stores.
Prometheus data limitations
InfluxData’s InfluxDB has a similar single-node approach with some similar design objectives, but InfluxDB Enterprise includes clustering to support environments that require high availability. Because InfluxDB Enterprise includes Kapacitor and Telegraf, you can maintain any investment in building Prometheus end-points, but you can store data on multiple nodes in a clustered InfluxDB Enterprise deployment. InfluxDB’s native scripting and querying language, Flux, supports PromQL to further reduce the headache of having to write queries to both data stores.
Conclusion
Using Prometheus for monitoring is a good choice in Kubernetes environments that require pull-based monitoring and alerting of metrics. For environments that require monitoring or alerting of both metrics and events, or where high availability is a requirement, then consider augmenting your architecture to include InfluxData’s InfluxDB Enterprise or InfluxDB and Kapacitor. InfluxData will continue to enhance support for Prometheus going forward. To stay up-to-date with the latest developments, please follow the project on GitHub.
Resources
Video
Blogs
- Expand Kubernetes Monitoring with Telegraf Operator
- InfluxDB Now Supports Prometheus Remote Read & Write Natively
- InfluxDB and Kapacitor: An Enhanced Data Model and Functional Query Language
- Monitoring the Kubernetes Nginx Ingress with the Nginx InfluxDB Module
- Monitoring Kubernetes Architecture
- Monitoring with Push vs. Pull: InfluxDB Adds Pull Support with Kapacitor
- InfluxDB 1.4 Now Available: InfluxQL Enhancements, Prometheus Read/Write, Better Compaction and a lot more!
- Percona Live Dublin recap
- Prometheus + InfluxDB: Thoughts After the Austin Monitoring Meetup
GitHub
Prometheus News
- Latest InfluxData Release Introduces Industry's First Advanced Kubernetes Auto Scaling and Prometheus Read/Write Support
- IoT Innovator | New InfluxData Update Adds Advanced Kubernetes Auto Scaling, Prometheus Read/Write Support
- InfluxData Releases Updated Prometheus Support; Selected to Present at KubeCon + CloudNativeCon Europe 2018
- Container Journal | InfluxData Gives Prometheus Monitoring a Real-Time Analytics Edge