TL;DR Python, Pandas Dataframes, and InfluxDB
By
Susannah Brodnitz /
Product, Use Cases, Getting Started
Nov 09, 2022
Navigate to:
InfluxDB has over a dozen client libraries so developers can get started more easily and program in the language they’re most comfortable with. One of our most popular options is the Python client library. InfluxDB supports not just Python but pandas, a tool popular with data scientists for analyzing and manipulating data. You can use the client library to output data from InfluxDB into a DataFrame format pandas can ingest, and you can write pandas DataFrames directly to InfluxDB. This makes it simple to incorporate InfluxDB into your data science applications and take advantage of real-time monitoring and alerting.
Querying data from InfluxDB to a Pandas Dataframe
To query and return a pandas DataFrame in InfluxDB you first need to install the Python client library. To set it up you need to download that library into your Python project and set your credentials, including your URL, token, and the name of the bucket you want to send data to and read data from. Once you’ve set up your credentials you can query data.
Here’s an example query:
query= ‘’’
from (bucket: “system”)
|> range(start: -5m, stop: now())
|> filter(fn: (r) => r._measurement = “cpu”)
|> pivot(rowKey: [“_time”], ColumnKey: [“tag”], ValueColumn: “value”)
|> keep(columns: [“_time”, “usage_user”, “cpu”])
‘’’
client = InfluxDBClient(url, token, org, debug = false)
system_stats = client.query.api().query_data_frame()
display(system_stats.head())
This query is written in Flux, InfluxDB’s scripting and query language. It asks for data from the bucket “system” from the past five minutes and filters on the measurement “cpu.” The two most important functions to keep in mind when you’re working with InfluxDB and pandas are pivot
and keep
. Pivot
changes the data format to a column. Pandas dataframes are two-dimensional tables so pandas functions and tools expect data to be in that format and you’ll need to pivot the data you query from InfluxDB. Keep
tells the query what data to keep in this column and gets rid of any other columns you don’t want to include in the output pandas DataFrame.
After you’ve written your query, you need to call the InfluxDB client with your credentials and use the query API with the query_data_frame()
function to get the DataFrame format. Then you can display the data inside of the pandas DataFrame with .head
.
Writing data from a Pandas Dataframe to InfluxDB
You can also write data from a pandas dataframe into InfluxDB. Here’s some example code that does that:
system_stats.drop(columns = [‘result’, ‘table’, ‘start’, ‘stop’])
system_stats.set_index(“_time”)
_write_client.write(bucket.name, record = system_stats)
Any data that you don’t remove using the drop
function from your pandas DataFrame becomes a field when you write the data to InfluxDB. In this example query we’ve removed columns we don’t want to send as fields to InfluxDB because we want to make them tags. You also need to set your index, which will be your timestamp in InfluxDB. Then you need to use the Python client library write
function to write the data to InfluxDB. It takes in a bucket name and the record you want to write to that bucket. You also need to set up credentials for the client, as in the first example.
One thing to consider is the timestamp. Pandas comes with a datetime format. You can use that or you can use one of the following time formats:
“2018-10-26”, “2018-10-26 12:00”, or “2018-10-26 12:00:00-05:00”
Flux is a powerful analytics language, but you might want to use pandas because it’s very popular within the Python community. It has a lot of support and many people are already familiar with it and comfortable working with DataFrames. Combining pandas with the InfluxDB Python client library lets you easily incorporate InfluxDB into your applications to get the benefits of a purpose-built time series database.
To learn more about using pandas DataFrames with InfluxDB you can watch our Meet the Developer video on this topic.