InfluxDB SQL Queries with Python
By
Josh Powers /
Use Cases, Developer
Feb 21, 2023
Navigate to:
Recently InfluxData announced SQL support in InfluxDB Cloud. Users can now use familiar SQL queries to explore and analyze their time series data. The SQL support was introduced along with the usage of Apache Arrow.
Apache Arrow is an open source project used as the foundation of InfluxDB’s SQL support. Arrow provides the data representation, storage format, query processing, and network transport layers. Apache Flight SQL provides a method for interacting with Arrow via SQL.
The Flight SQL DB API library provides a seamless user experience for users interacting with Apache Flight SQL in Python. This library can be used directly with InfluxDB with SQL support.
The following demonstrates an end-to-end example of using the library with InfluxDB Cloud. Users who need to execute other operations, like inserting data, querying with Flux, or any other API requests, can continue to use the existing influxdb-client-python client library.
Note: FlightSQL is useful for writing database agnostic code for different databases that support it. This more general approach is what this article covers. For example, if you are creating a UI or other user experience that might query different kinds of databases. If your intent is to only use InfluxDB, we suggest using the upstream Flight libraries directly, because they have simpler dependencies and support both InfluxQL and SQL.
Install the library
The library is published on PyPI in order to make it easy to install:
pip install flightsql-dbapi
To use the library, users need to import the FlightSQLClient
in their code:
from flightsql import FlightSQLClient
Connecting to InfluxDB Cloud
To create a client for interacting with InfluxDB, users need to provide three pieces of information:
-
host: The hostname of the InfluxDB Cloud instance — note that this does not require a protocol (e.g. “https://”) or a port (e.g. 8086). The underlying library builds the full connection string.
-
token: This is the InfluxDB token string with access to the bucket.
-
bucket: In the client metadata, users need to provide what InfluxDB bucket to use with the connection; each connection then belongs to a specific bucket.
Below is an example of this connection process, where a user connects to InfluxDB Cloud and uses a token from the environment called “INFLUX_TOKEN” to connect to the “telegraf-monitoring” bucket:
client = FlightSQLClient(
host="us-east-1-1.aws.cloud2.influxdata.com",
token=os.environ["INFLUX_TOKEN"],
metadata={"bucket-name": "telegraf-monitoring"},
)
From here users can use the returned client to interact with InfluxDB’s SQL support. For example, a user could execute queries on, or query information about the bucket’s tables and schema.
Executing a query
Passing a query to the client is a straightforward process of passing the SQL query string to the execute()
function. However, the following step is unique to Flight SQL, which requires gathering a ticket from an endpoint:
info = client.execute("select * from cpu limit 10")
reader = client.do_get(info.endpoints[0].ticket)
In Apache Flight SQL, once a query is passed to the server, the server returns a FlightInfo object. The object contains a list of endpoints describing locations containing the query result. In the case of InfluxDB Cloud, there will only be a single endpoint. The client can then gather the query result from the information provided by the single endpoint’s ticket.
Query result handling
The response from the client is of the type FlightStreamReader. This allows the user to stream the data to avoid filling up memory and to be more efficient:
for batch in reader:
print(batch)
If the user only has a limited number of results from the query, they can use the read_all()
function to collect all data chunks into a Table:
data = reader.read_all()
The return value from this is a PyArrow Table. These tables are groups of arrays representing columns of data in tabular form. The table provides a huge number of performant operations and abilities to convert to other data types.
For example, a user can convert the table directly into a Pandas DataFrame. This allows users to quickly manipulate, transform, and analyze their data using the powerful Pandas library:
reader.read_all().to_pandas()
Users of the Polars library can also take advantage of Arrow support directly. A user can create a Polars DataFrame directly from the resulting Arrow Table:
polars.from_table(reader.read_all())
Schemas and tables
When working with SQL it is helpful to know what schema or tables are available. To find that information when working with InfluxDB Cloud, users can use the following scripts to gather the tables:
info = client.get_tables()
tables = client.do_get(info.endpoints[0].ticket).read_all()
Or the following for the full schema:
info = client.get_db_schemas()
schema = client.do_get(info.endpoints[0].ticket).read_all()
As the table and schema information is relatively small, using read_all()
here gets the user the data immediately.
Try out SQL with InfluxDB and Python today
This post demonstrated how users could take advantage of the Python Flight SQL library to connect and retrieve data from InfluxDB Cloud, powered by IOx. Users can now take advantage of familiar SQL queries with the leading time-series database and report out using Python.
Consider where you can use InfluxDB Cloud with SQL support and the Python Flight SQL library and give it a shot today!