How to Pivot Your Data in Flux: Working with Columnar Data
By
Michael Hall /
Product, Use Cases, Developer
Oct 12, 2021
Navigate to:
Relational databases are by far the most common type of database, and as software developers it’s safe to say that they are the kind of database most of us got started on, and probably still use on a regular basis. And one thing that they all have in common is the way they structure data. InfluxDB, however, structures data a little bit differently.
RDBMS default table structure
With a relational database you have a collection of tables, each with its own fixed layout of columns, and you read and write data in rows. Each row has a column that holds a value for each field, and you get all of those values at the same time, giving you a table that looks like this:
time | field1 | field2 | host |
<data_tstamp> | 123 | 789 | myhost1 |
<data_tstamp> | 456 | 101 | myhost2 |
InfluxDB result data
InfluxDB doesn’t follow this pattern. It returns data in what’s known as a columnar format, which just means that you will get the data one column at a time, rather than one row at a time. This is often a source of confusion for developers who are used to the row format they worked with from a relational database, but it has many benefits when working with time series data.
When you run a basic Flux query like this:
from(bucket: "telegraf")
|> range(start: -1d)
You will get data returned like this:
table | _measurement* | _field* | _value | _start* | _stop* | _time | host* |
0 | mydata | field1 | 123 | <query_start> | <query_stop> | <data_tstamp> | myhost1 |
0 | mydata | field1 | 456 | <query_start> | <query_stop> | <data_tstamp> | myhost2 |
1 | mydata | field2 | 789 | <query_start> | <query_stop> | <data_tstamp> | myhost1 |
1 | mydata | field2 | 101 | <query_start> | <query_stop> | <data_tstamp> | myhost2 |
The first thing you’ll notice is that the table’s column headers are NOT your field names like they would be if you were querying a SQL database. Instead your query will return your results with these same columns, and each record (represented by r
in the query below) in the result will represent a single field’s value at that point in time.
This is why when you filter your Flux query, it looks like this:
|> filter(fn: (r) => r["_measurement"] == "mydata")
|> filter(fn: (r) => r["_field"] == "field1")
This data structure allows for faster queries and aggregations in the most common usage patterns for InfluxDB. Because InfluxDB is processing one column’s values at a time, it can perform calculations on the data for just that field instead of performing multiple calculations at the same time.
Pivoting between column-like and row-like data layouts
There are times, however, when you are going to want your data to look more like the results of a traditional database query. To do that you need to perform a “pivot” operation. Pivoting data is the act of flipping it from having the fields in separate rows to having the fields in separate columns.
Flux provides a pivot
function that does precisely this:
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
This function will flip your data so that every row represents a moment in time (rowKey:["_time"]
), your fields will each become a column (columnKey: ["_field"]
), and the value in each column will be your field’s value (valueColumn: "_value"
). Giving you a result that looks like this:
table | _measurement* | _start* | _stop* | _time | host* | field1 | field2 |
0 | mydata | <query_start> | <query_stop> | <data_tstamp> | myhost1 | 123 | 789 |
0 | mydata | <query_start> | <query_stop> | <data_tstamp> | myhost2 | 456 | 101 |
Using fieldsAsCols instead
This is the most common way people want to pivot their data, because it puts it into the same shape as they’re used to working with from a traditional database. It’s so common, in fact, that we provide a shortcut function in the schema
package just for that use case.
After importing schema
into your Flux script, the above line of code can be replaced with:.
Flux
|> fieldsAsCols()
Other ways to pivot
Tags as columns
Even though it’s the most common way to pivot, fields are not the only thing you can flip up into column headers. Suppose you want to compare values not against different fields, but against the same field from different sources. For example, if you want to compare readings from different server hosts taken at the same time.
In our previous example, our data had a tag called host
that contained this value.
table | _measurement* | _field* | _value | _start* | _stop* | _time | host* |
0 | mydata | field1 | 123 | <query_start> | <query_stop> | <data_tstamp> | myhost1 |
0 | mydata | field1 | 456 | <query_start> | <query_stop> | <data_tstamp> | myhost2 |
1 | mydata | field2 | 789 | <query_start> | <query_stop> | <data_tstamp> | myhost1 |
1 | mydata | field2 | 101 | <query_start> | <query_stop> | <data_tstamp> | myhost2 |
If we wanted to combine the values of each field from all of our hosts into a single record, we could do that with:
|> group()
|> pivot(rowKey:["_time", "_field"], columnKey: ["host"], valueColumn: "_value")
The first line re-groups our data because InfluxDB by default will put values from different hosts (or any difference in tag values) into separate result tables. Since pivot
works on one result table at a time, we have to call group
first to put the values from both hosts into a single table, then we can pivot that combined table around the host
values.
This query gives us a table structure that looks like this:
table | _field | _start | _stop | _time | myhost1 | myhost2 |
0 | field1 | <query_start> | <query_stop> | <data_tstamp> | 123 | 456 |
0 | field2 | <query_start> | <query_stop> | <data_tstamp> | 789 | 101 |
Combined columns
You can also use pivot
to combine field names with tag values. Building off the above example, suppose you want to see the values of both field1
and field2
for each host in a single row?
We can do that with a slight modification to our previous function call:
|> group()
|> pivot(rowKey:["_time"], columnKey: ["host", "_field"], valueColumn: "_value")
By moving the _field
data from the rowKey
to the columnKey
, along with the host
data, we get a table that combines both into unique columns.
table | _start | _stop | _time | myhost1_field1 | myhost1_field2 | myhost2_field1 | myhost2_field2 |
0 | <query_start> | <query_stop> | <data_tstamp> | 123 | 789 | 456 | 101 |
Further reading
- pivot() function: The reference documentation on the pivot() function.
- schema.fieldsAsCol() function: The reference documentation on the fieldsAsCol() function from the schema package.
- group() function: The reference documentation on the group() function.
- TL;DR InfluxDB Tech Tips: Multiple Aggregations with yield() in Flux: This post teaches you how to use multiple yield().
- TL;DR InfluxDB Tech Tips – Aggregating across Tags or Fields and Ungrouping: This post teaches how to use group() to combine result tables.