Client Library Deep Dive: Python (Part 1)
By
Jay Clifford /
Developer, Product
Jul 26, 2023
Navigate to:
Working with the new InfluxDB 3.0 Python CLI and Client Library
Community Client libraries are back with InfluxDB 3.0. If you would like an overview of each client library then I highly recommend checking out Anais’s blog on their status.
In this two-part blog series, we do a deep dive into the new Python Client Library and CLI. By the end, you should have a good understanding of the current features, how the internals work, and my future ideas for both projects. From there my hope is that it gives you the opportunity to contribute to, and have your say in their future.
In this post (Part 1), we will focus primarily on the Client Library because it underlies the Python CLI.
If you prefer, you can watch this tutorial in video form.
Python client library
So, let’s start off with the Python client library. The scope was simple: build a library that could write to and query InfluxDB 3.0. Because the write endpoint didn’t change inInfluxDB 3.0, we could bring forward much of the functionality from the V2 library, such as batch writes, data parsing, point objects, and much more. However, on the query side of things, we had to completely remake it. We wanted to focus on the capabilities of Arrow Flight and bring support for both SQL and InfluxQL-based queries. PyArrow also opened up better ecosystem support for libraries such as Pandas and Polars, but I’ll have more on this later.
Let’s build a simple Python application together that writes and queries InfluxDB 3.0.
Install
To install the client library (I recommend making a Python Virtual Environment first):
$ python3 -m venv ./.venv
$ source .venv/bin/activate
$ pip install –upgrade pip
$ pip install influxdb3-python
This set of commands creates our Virtual Python Environment, activates it, updates our Python package installer, and, finally, installs the new client library.
Creating a client
In this section, we import our newly installed library and establish a client. I also discuss some configuration parameters and the reasoning behind them.
Let’s create a main.py file with the following code:
from influxdb_client_3 import InfluxDBClient3, Point
import pandas as pd
import numpy as np
import datetime
client = InfluxDBClient3( token="",
host="eu-central-1-1.aws.cloud2.influxdata.com",
org="6a841c0c08328fb1",
database="pokemon-codex")
This example shows a minimal configuration for the client. Like previous clients, it requires the following parameters:
token | This provides authentication for the client to read and write from InfluxDB Cloud Serverless or Dedicated. Note: you need a token with read-and-write authentication if you wish to use both features. |
host | InfluxDB host — this should only be the domain without the protocol (https://) |
org | Cloud Serverless still requires the users’ organization ID for writing data to 3.0. Dedicated users can just use an arbitrary string. |
database | The database you wish to query and write from. |
I recommend creating a client on a per-database basis, though you can update the _database
instance variable if you only want to create one client.
Next, let’s take a look at the advanced parameters of the client:
flight_client_options | This provides access to parameters for the flight query protocol. You can find configuration options here. Example. |
write_client_options | This provides access to the parameters used by the V2 write client, which you can find here. Example. |
**kwargs | Lastly, this provides access to the parameters used by the V2 client, which you can find here. Example. (gzip compression) |
Let’s continue our original example by discussing the write functionality.
Writing data
So now that we established our client, in this section we look at the different methods you can use to write data to InfluxDB 3.0. Most will be familiar to you as they follow the same ingestion method as V2.
Let’s start off with basic point building:
# Continued from the Client's example
now = datetime.datetime.now(datetime.timezone.utc)
data = Point("caught").tag("trainer", "ash").tag("id", "0006").tag("num", "1")\
.field("caught", "charizard")\
.field("level", 10).field("attack", 30)\
.field("defense", 40).field("hp", 200)\
.field("speed", 10)\
.field("type1", "fire").field("type2", "flying")\
.time(now)
try:
client.write(data)
except Exception as e:
print(f"Error writing point: {e}")
In this example, you can see we build our line protocol using an instance of the Point class, which then translates into line protocol:
Point,trainer=ash,id=0006,num=1 caught="charizard",level=10i,attack=30i,defense=40i,hp=200i,speed=10i,type1="fire",type2="flying" <timestamp>
You can also format this as an array of points:
data = []
# Adding first point
data.append(
Point("caught")
.tag("trainer", "ash")
.tag("id", "0006")
.tag("num", "1")
.field("caught", "charizard")
.field("level", 10)
.field("attack", 30)
.field("defense", 40)
.field("hp", 200)
.field("speed", 10)
.field("type1", "fire")
.field("type2", "flying")
.time(now)
)
# Adding second point
data.append(
Point("caught")
.tag("trainer", "ash")
.tag("id", "0007")
.tag("num", "2")
.field("caught", "bulbasaur")
.field("level", 12)
.field("attack", 31)
.field("defense", 31)
.field("hp", 190)
.field("speed", 11)
.field("type1", "grass")
.field("type2", "poison")
.time(now)
)
You can also write via dictionary encoding and structured data methods. One of my favorite ingest methods is via Pandas DataFrame.
Let’s take a look at an example utilizing this method:
# Convert the list of dictionaries to a DataFrame
caught_pokemon_df = pd.DataFrame(data).set_index('timestamp')
# Print the DataFrame
print(caught_pokemon_df)
try:
client.write(caught_pokemon_df, data_frame_measurement_name='caught',
data_frame_tag_columns=['trainer', 'id', 'num'])
except Exception as e:
print(f"Error writing point: {e}")
This example creates a Pandas DataFrame of our caught Pokemon for this session. We set the index of our dataframe to the timestamp of when the Pokemon was caught and then provide the dataframe plus the following write parameters to the ‘write()’ function:
data_frame_measurement_name | The name of the measurement you wish to write your Pandas DataFrame into. |
data_frame_tag_columns | A list of strings containing the column names you wish to make tags. |
data_frame_timestamp_column | Use this parameter to set the timestamp column if your index is not set to the timestamp. |
Make sure to check out the full example here. You can also find a batching example here.
Writing data from a file
A much-requested feature of the previous client library was more ways to upload and parse different file data formats. Leveraging the utilities of PyArrow, we can now support the upload of files in the following formats:
CSV | Example here. |
JSON | Example here. |
Feather | Example here. |
ORC | Example here. |
Parquet | Example here. |
Querying data
Now that we wrote some data into InfluxDB 3.0, let’s talk about how to query it back out. 3.0 provides a fully supported Apache Arrow Flight endpoint, which allows users to query using SQL or InfluxQL.
Let’s first take a look at a basic time series query in both SQL and InfluxQL;
from influxdb_client_3 import InfluxDBClient3
import pandas as pd
client = InfluxDBClient3(
token="",
host="eu-central-1-1.aws.cloud2.influxdata.com",
org="6a841c0c08328fb1",
database="pokemon-codex")
sql = '''SELECT * FROM caught WHERE trainer = 'ash' AND time >= now() - interval '1 hour' LIMIT 5'''
table = client.query(query=sql, language='sql', mode='all')
print(table)
influxql = '''SELECT * FROM caught WHERE trainer = 'ash' AND time > now() - 1h LIMIT 5'''
table = client.query(query=influxql, language='influxql', mode='pandas')
print(table)
As you can see in this example we used the same client to query both with InfluxQL and SQL. Let’s take a quick look at the query parameters to see how they shape our returned result.
query | This parameter currently accepts the string literal of your SQL or InfluxQL query. We hope to add prepared statements to this soon. |
language | This parameter accepts a string literal of either ‘sql’ or ‘influxql’ |
mode | There are currently 5 return modes: 1. ‘all’: this returns all the data queried as a PyArrow Table 2. ‘pandas’: Returns all data as a Pandas DataFrame 3. ‘chunk’: Returns a flight reader so a user can iterate through large queries in smaller sample sizes (see example) 4. ‘reader’: Attempts to convert the stream to a RecordBatchReader 5. ‘schema’: returns the query payload schema |
Future hopes
Rome wasn’t built in a day, and there are plenty of quality-of-life improvements and new features to add. Here is a table outlining a few:
Feature | Status |
Merge the Write API from the V2 Client to remove the external library dependency. | In progress |
Prepared Statements for queries | TO DO |
Arrow table writer for InfluxDB | TO DO |
Improve Polars support | TO DO |
Integrate delta sharing | TO DO |
Try it out for yourself
We built the foundations of what I hope will be a great community-driven client library for InfluxDB 3.0 in Python. My call to action is if you haven’t already done so, try out the library and put it through its paces. There are so many edge cases we might not be aware of and we won’t find those without community help. I am eagerly awaiting issues and feature requests.