Apache Kafka Burrow Monitoring

Powerful performance with an easy integration, powered by Telegraf, the open source data connector built by InfluxData.

5B+

Telegraf downloads

#1

Time series database
Source: DB Engines

1B+

Downloads of InfluxDB

2,800+

Contributors

Table of Contents

Powerful Performance, Limitless Scale

Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.

See Ways to Get Started

Burrow is a monitoring companion solution designed for Apache Kafka that offers features like consumer lag checking as a service, all without the need for specifying thresholds. It is a tool built to monitor committed offsets for all consumers, at which point it will then calculate the status of those consumers on demand.

In other words, it’s a way to determine consumer status by evaluating the consumer’s behavior over a sliding window. This can help you determine whether they are committing offsets, whether the number of offsets commits are increasing, whether or not lag is increasing and much, much more.

An HTTP endpoint will then be provided to the request status on demand, as well as any other relevant Kafka cluster information. There are also a number of configurable notifiers that can send out status reports via email or HTTP calls to another service, thus guaranteeing that you always have access to the most accurate information possible at all times.

Why use a Telegraf plugin for Burrow?

Monitoring your Apache Kafka infrastructure is important because it could be your core pipeline for your application’s data, and therefore will help ensure 100% availability. The Burrow Telegraf Plugin can help achieve this by monitoring whether or not the consumers are keeping up with the messages. It does this by giving you a view of the offsets that the consumers are committing and the broker’s state.

You can collect these metrics into your InfluxDB instance and build alerts based on thresholds that you set to help maintain this critical piece of your application stack.

How to monitor Kafka consumers using the Burrow Telegraf Plugin

Burrow Telegraf Plugin configurations allow you to set response times, limit concurrent connections, filter clusters, consumer groups, and topics.

To properly configure Burrow in your own environment, use the following commands. Note that you will need to replace the default values with information relevant to your own infrastructure. All default values are noted in the appropriate location.

[[inputs.burrow]]
 ## Burrow API endpoints in format "schema://host:port".
 ## Default is "http://localhost:8000".
 servers = ["http://localhost:8000"]

 ## Override Burrow API prefix.
 ## Useful when Burrow is behind reverse-proxy.
 # api_prefix = "/v3/kafka"

 ## Maximum time to receive response.
 # response_timeout = "5s"

 ## Limit per-server concurrent connections.
 ## Useful in case of large number of topics or consumer groups.
 # concurrent_connections = 20

 ## Filter clusters, default is no filtering.
 ## Values can be specified as glob patterns.
 # clusters_include = []
 # clusters_exclude = []

 ## Filter consumer groups, default is no filtering.
 ## Values can be specified as glob patterns.
 # groups_include = []
 # groups_exclude = []

 ## Filter topics, default is no filtering.
 ## Values can be specified as glob patterns.
 # topics_include = []
 # topics_exclude = []

 ## Credentials for basic HTTP authentication.
 # username = ""
 # password = ""

 ## Optional SSL config
 # ssl_ca = "/etc/telegraf/ca.pem"
 # ssl_cert = "/etc/telegraf/cert.pem"
 # ssl_key = "/etc/telegraf/key.pem"
 # insecure_skip_verify = false

There are also a number of group and partition status mappings that you can choose, including ones like:

  • OK = 1
  • NOT_FOUND = 2
  • WARN = 3
  • ERR = 4
  • STOP = 5

Key Burrow metrics to use for monitoring

Some of the important Burrow metrics that you should proactively monitor include:

  • For burrow_group (one event per each consumer group) and burrow_partition (one event per each topic partition)
    • status
    • status_code
    • partition_count
    • offset
    • total_lag
    • lag
  • burrow_topic (one event per topic offset)
    • offset
For more information, please check out the documentation.

Project URL   Documentation

Powerful Performance, Limitless Scale

Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.

See Ways to Get Started

Related Integrations