Writing Tasks and Setting Up Alerts for InfluxDB Cloud
By
Faith Chikwekwe /
Use Cases, Product, Developer
May 07, 2020
Navigate to:
(Update: InfluxDB 3.0 moved away from Flux and a built-in task engine. Users can use external tools, like Python-based Quix, to create tasks in InfluxDB 3.0.)
If you are using InfluxDB to monitor your data and systems, then alerts may be an essential part of your workflow. We currently have a system for monitoring your data whether it enters a critical or non-critical state.
Here I’m going to give a detailed guide on setting up alerts using our InfluxDB Cloud product as well as some best practices for having a good experience using alerts.
We’ll be working with the Flux scripting language, to make it super easy to write tasks and understand the checks that we set up for our alerts.
For this tutorial, imagine that you are interested in monitoring and receiving alerts for your fruit collection company. You would probably think a lot about the amount of fruit that you are collecting and the farms that you’re collecting fruit from. You’ll also want to know when there are critical decreases in the amount of collected fruit. This is the kind of information that you’ll want to keep in mind as we go through this tutorial.
Step 1: Write your Flux query
An alert is only as good as the Flux query you write. Once signed in to InfluxDB Cloud, I like to start by using the Data Explorer or the Flux plugin in VSCode to examine the initial shape of my data.
If you’re using the Data Explorer, the Query Builder can be helpful for looking at the available buckets, fields, and measurements. Eventually, you may want to switch to using the Script Editor to write a more nuanced query.
You are going to turn this query into a task. You may also want to visualize the query/task output in order to see what you’re getting back. To that end, you might want to think about preserving columns that might give you clues as to why a metric changed from a critical to a non-critical state or vice versa.
For example, if you have initial data with the following columns:
_start, _stop, _time, _field,_measurement,farmName
And you’d like to be able to monitor the total fruit collected every hour by looking at the totalFruitsCollected
measurement, it might also be helpful to preserve the farmName
column to be able to visualize which farms have increased or decreased their output later.
Generally, Flux queries that translate well into tasks and alerts will window their data using the aggregateWindow()
function to examine changes over time intervals.
A simple query for our fruit collection example might look like this:
from(bucket: "farming")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => (r._measurement == "totalFruitsCollected))
|> filter(fn: (r) => (r._field == "fruits))
|> group(columns: ["farmName"])
|> aggregateWindow(fn: sum, every: 1h)
|> map(fn: (r) => {
return: _time: r._time, _stop: r._stop, _start: r._start, _measurement: "fruitCollectionRate", _field: "fruits", _value: r._value, farmName: farmName,
}
})
Note that since the aggregateWindow()
function eliminates all columns except _time
, _stop
, _start
and _value
, we have to use the map()
function to restore our other columns.
Step 2: Convert your Flux query to a task
A task is a Flux query that runs on a schedule. To that end, once you’ve queried for the right data, converting your query to a task is fairly straightforward.
In InfluxDB Cloud, you can use the “Save as Task” option to convert your query. In this dialog box, you need an every
parameter (and perhaps an offset) for this task. After you’ve saved, I would recommend finding your task on the Tasks page and updating the range()
function to use our task.every
parameter instead of the built in start
and stop
values. We’ll use a negative value, so that we’re looking back at that time interval every time the task runs.
task.every
is often also used as the window for the aggregateWindow()
function. This will be different for every case, but you’ll want to keep in mind how much data will be written within the selected interval and decide accordingly.
Our fruit collection company collects hundreds of fruits per hour. We’d like to be able to see the fruit collection in 1-hour intervals.
Next, you’ll want to update. I’m also going to update the aggregateWindow()
function to use my task.every
value.
In order to preserve the results of my task, I’m going to use the to()
function to write my new fruitCollectionRate
measurement back into the farming
bucket. This way I can access the data for my dashboard and my notification checks.
My task will end up like this:
option task = {name: "fruitCollectedRate", every: 1h}
fruitCollected = from(bucket: "farming")
|> range(start: -task.every)
|> filter(fn: (r) => (r._measurement == "totalFruitsCollected))
|> filter(fn: (r) => (r._field == "fruits))
|> group(columns: ["farmName"])
|> aggregateWindow(fn: sum, every: task.every)
|> map(fn: (r) => {
return: _time: r._time, _stop: r._stop, _start: r._start, _measurement: "fruitCollectionRate", _field: "fruits", _value: r._value, farmName: farmName,
}
})
fruitCollected
|> to(bucket: "farming")
Note that you don’t have to add the task variable as shown above. If you add the task name and every value in the text boxes via the UI, the system will add the variable for you.
Step 3: Create a dashboard to visualize your task output
This step is optional, but it’s a great way to validate that your task is operating properly before setting up your alerts.
You can head over to the Dashboards tab and click to create a dashboard. If you had additional metrics that you were monitoring about fruit like badApplesPerBunch
or rateOfLifeGivingLemons
you could add them to this dashboard as well.
My simple query to see my task’s output data could look like this:
from(bucket: "farming")
|> range(start: -task.every)
|> filter(fn: (r) => (r._measurement == "fruitCollectionRate"))
|> filter(fn: (r) => (r._field == "fruits"))
There are many tricks to visualize this data in more interesting ways, but this works well for now. Since we preserved farmName
and grouped on it, every farm will have its own table in the output stream and its own line on the resulting graph, giving us more detailed visibility.
Step 4: Create a check for your alert
Besides writing a well-formed query, setting up a well-calibrated check is the most important part of writing a good alert. In most cases, this will be an iterative process. You’ll want to make sure that critical alerts are not sent out too frequently, but that they quickly inform the recipient when something goes wrong.
We need to be alerted when there are not enough fruits coming into the collection center. Let’s say that our operation is critical when less than 20 fruits are collected within a 1-hour period from any farm.
Head over to the Monitoring and Alerting tab, and in the Checks column, let’s create an alert. We’re going to work with threshold checks for now. You’ll use the UI here to look in the farming
bucket, filter on the fruitCollectionRate
measurement and the “fruits” field. You can also use the UI to window further and set an aggregateFunction
. Once you’ve built your threshold query, let’s configure the check.
It is very important to set a tag here to associate with your notification rules later. Tags ensure that the alerts you fire off go to the right recipient. I only want the head collector to be alerted when things get critical. I’ll set my tag to be role
== headCollector
.
You can also configure your status message here. Remember how we preserved the farmName
? Let’s add that column here in the status message so the head collector knows which farm is having collection issues.
Check: ${ r._check_name } is: ${ r._level }. ${ r.farmName } is below collection threshold.
And for my check: When value is below 20, set status to CRI
.
Step 5: Set up a notification endpoint
This is the way that the alert will be sent out to your recipient. It is pretty easy to set up a Slack webhook for this purpose. Other options such as PagerDuty or setting up your own HTTP endpoint are available for paid InfluxDB Cloud users.
Name your endpoint, then add the URL and other necessary parameters.
Step 6: Make notification rules for your alert
Finally, we’re ready to start sending alerts about your data. For our purposes, we only care about critical alerts. We’d like to send out alerts every hour about how fruit collection is going.
Remember, to add your tag here, role == headCollector
. Any checks with a matching tag will be sent out using this notification rule. I’m going to send my head collector alerts via Slack. I’ll select the right endpoint from the dropdown menu. You can also further customize your message template here if you’d like.
When you click to create your notification rule, you should be good to go. Now you’ll be informed anytime you hit that critical status.
Resources to learn more about alerting
If you need more information about how alerts work, check out the docs here. The Flux documentation can be found here for information on writing good queries and tasks. Here’s a previous blog that gives another overview of our alerts system. If you have any questions about setting up your alerts, join our community Slack channel or post them in our community forums.