InfluxQL vs SQL for InfluxDB
By
Anais Dotis-Georgiou /
Developer
Jan 03, 2025
Navigate to:
InfluxDB is a purpose-built time series database designed to handle high-write throughput and large volumes of time-stamped data. From monitoring system metrics to tracking IoT device readings and analyzing financial trends, it excels in scenarios where time is a fundamental factor. With the release of InfluxDB v3, users now benefit from dual query language support: SQL and InfluxQL. SQL, powered by the Apache DataFusion implementation, introduces the flexibility of a familiar, widely-used language, enabling seamless integration with existing analytics workflows. On the other hand, InfluxQL is a domain-specific query language tailored for time series data, offering optimized syntax for time-based aggregations and windowing. As the native query language for InfluxDB 1.x, InfluxQL is a natural choice for users migrating from earlier versions to v3, allowing them to migrate their existing queries seamlessly.
This blog post will show you the tools available for migrating queries and help you decide which query language to pick. While InfluxQL remains a powerful option, SQL has become the prominent choice for InfluxDB v3 due to its increased function support, wider adoption, and enhanced interoperability with other tools. This dual-language approach allows users to choose the best tool for their specific needs, whether leveraging advanced SQL capabilities or utilizing InfluxQL’s specialized time series constructs. This blog explores the distinctions between SQL and InfluxQL, their advantages, and scenarios where each excels.
To follow along with the examples in this post, use the Air Sensor dataset from the NOAA. You can easily write line protocol data to InfluxDB v3 Cloud through the UI (Load Data > Buckets > + Add Data > Line Protocol > Select the bucket you want to write to > Load the file or copy and paste the Line Protocol).
Some terminological differences
Before exploring the advantages and disadvantages of each language, it’s essential to understand the key terminological differences between them. In SQL, the concepts of databases, tables, and columns are central. A database is a container that organizes tables, each holding rows and columns of structured data. Columns represent specific attributes or properties of the data, such as names, dates, or values.
In InfluxQL, these concepts are represented as buckets, measurements, fields, and tags. A bucket is equivalent to a database and is a logical container for time series data. Measurements are similar to tables and group related data points, such as sensor readings or metrics. Fields represent the actual data values, like temperature or pressure, while tags are key-value pairs that provide metadata for filtering and organizing the data, such as location or device type. While the terminology differs, the underlying concepts remain analogous: a database corresponds to a bucket, a table to a measurement, and columns to fields and tags.
So, to summarize:
Database <-> Bucket
Table <-> Measurement
Columns <-> Fields and Tags
Fields and tags are converted to columns in InfluxDB v3, effectively eliminating any functional distinction between them. In earlier versions like InfluxDB 1.x, users needed to carefully decide which values to index, as indexing tags could impact performance due to cardinality concerns. With InfluxDB v3, these concerns are no longer an issue, as the architecture is optimized to handle high cardinality efficiently.
Basic syntactic differences
InfluxQL and SQL are pretty similar. For example, you would select all data from a table with:
#SQL or InfluxQL
SELECT * FROM "airSensors"
However, there are a few notable syntactic differences between the two languages. There are also some syntactic differences between SQL in InfluxDB v3 and the SQL you might be familiar with. This is because SQL in InfluxDB v3 uses the Apache DataFusion implementation of SQL, which has specific syntax requirements. For example, you must explicitly define the Arrow Datatypes in the query. In other words, if you want to query all of the Air Sensor data from the past five minutes, you have to specify the interval type in SQL:
#SQL
SELECT * FROM "airSensors" WHERE time >= now() - interval '5 minutes'
In contrast, in InfluxQL, you don’t have to specify the interval type. Also, notice how the duration literals differ for SQL and InfluxQL:
#InfluxQL
SELECT * FROM "airSensors" WHERE time >= now() - 5m
Similarly, if you want to query from a specific timestamp you have to specify the Datatype with SQL but not InfluxQL. Also, note the difference between the datetime format for SQL ISO 8601 format for InfluxQL:
#SQLSELECT * FROM "airSensors" time >= ('2023-12-01 18:35:30.000')::TIMESTAMP
#InfluxQL
SELECT * FROM "airSensors" time > '2024-12-01T18:35:30Z
Another significant syntactic difference between the languages appears in how you apply aggregation or selector functions over time. Imagine you want to find the average air temperature for sensor TLM0100
over the past hour, grouped into 5-minute intervals (from the Air Sensor dataset). InfluxQL can also be a great choice if your needs don’t involve sophisticated analytics, complex subqueries, or joins. Its straightforward syntax often makes it easier to use for simpler queries. For instance, here’s how you could calculate that average air temperature over 5-minute time windows with InfluxQL:
#InfluxQL
SELECT MEAN("temperature") FROM "airSensors" WHERE "sensor" = 'TLM0100' AND time > now() - 1h GROUP BY time(5m)
By contrast, the equivalent function in SQL would require the date_bin() function. It also requires that you explicitly define the Arrow Dataypes:
#SQL SELECT "sensor_id",
DATE_BIN(INTERVAL '5 minute', time, TIMESTAMP '2022-01-01 00:00:00Z') AS _time,
avg("temperature") as average_temp
FROM "airSensors"
WHERE time >= now() - interval '1 hour' AND "sensor" = 'TLM0100'
User experience and query tools
One of the most notable differences between InfluxQL and SQL is the user experience and tools available for each language. In InfluxDB v3, you can use several tools to query data in either InfluxQL or SQL, including:
- InfluxDB v3 Python CLI: This CLI extension to the InfluxDB v3 Python client library allows users to query and write data interactively. Additionally, it features an (Alpha) integration with OpenAI’s ChatGPT for querying data using natural language. This feature requires an OpenAI API key, and is accessible via the chatgpt command. Hopefully, this feature will make querying in InfluxQL and SQL much easier.
- HTTP API: The HTTP API is ideal for users who prefer lightweight, language-agnostic solutions or when client libraries are not available for the language of choice. It provides direct access to InfluxDB’s capabilities for querying and managing data.
- Client Libraries: These libraries support various programming languages, including C#, Go, Java, Python, and JavaScript. They are well-suited for developers building custom applications or integrations and offer a structured way to interact with InfluxDB’s APIs.
- Arrow Flight + RPC Client: Flight RPC clients can query data using either SQL or InfluxQL, leveraging InfluxDB v3-specific Flight RPC protocol. A single
DoGet()
request enables authentication, querying, and data retrieval, making it efficient for high-performance use cases. - Flight SQL Clients: These clients use the Flight SQL protocol to query InfluxDB Cloud Dedicated databases exclusively with SQL. While they lack support for InfluxQL, their design optimizes performance and interoperability for SQL-based workflows.
- InfluxDB Cloud v3 UI: InfluxDB Cloud v3 also provides the ability to query with SQL in the UI. You cannot use the UI to query with InfluxQL.
The Data Explorer in the InfluxDB v3 Cloud UI also offers an SQL sync feature, which helps craft SQL queries for you.
BI and visualization tool support
Certain visualization tools only support specific query languages, which can influence your choice. For example, SQL is compatible with the following tools:
In contrast, InfluxQL is compatible with:
If your workflow relies on a specific visualization tool, consider selecting a query language that aligns best with its requirements.
Advantages of InfluxQL over SQL
The main advantage of InfluxQL over SQL is for any 1.x user looking to migrate their queries over to InfluxDB v3. As an aside, if you are a 1.x user and have continuous queries, you’ll have to migrate your InfluxQL continuous queries to Flux tasks or leverage the Python Client Library and use a processing engine of your choice. For example, you could leverage Quix Streams.
Another advantage of InfluxQL is in schema exploration. While SQL has some schema exploration functions available, their offerings are much more limited.
InfluxQL is an excellent choice for schema exploration, particularly if you’ve deliberately structured your data using tags and fields to enhance organization after ingestion.
Advantages of SQL over InfluxQL
While some queries are syntactically simpler in InfluxQL, SQL offers far more functions. You can take a look at some of the differences.
SQL also has the following functions that InfluxQL doesn’t:
- Window Functions
- String Functions
- JOINS
- UNIONS
- Array Functions
- May Functions
- Hashing Functions
- And so much more (to see the full list of functions available, see the Apache DataFusion SQL Reference).
These functions allow you to perform more sophisticated analysis in SQL. For example, you could find the derivative or rate of change of temperature and carbon monoxide per second from the Air Sensors data with:
#SQL SELECT time, (co_delta_v / delta_t_ns) * 1000000000 as co_rate, (temperature_delta_v / delta_t_ns) * 1000000000 as temperature_rate FROM
(SELECT (lag(co, 1) OVER (ORDER BY time)) - co as co_delta_v,
(lag(temperature, 1) OVER (ORDER BY time)) - temperature as temperature_delta_v,
(lag(cast(time as bigint), 1) OVER (ORDER BY time)) - cast (time as bigint) as delta_t_ns,
time
FROM airSensors WHERE sensor_id='TML0100' AND time >= now() - interval '1 hour' )
Using a subquery, it computes the difference between consecutive rows for both co
(co_delta_v
) and temperature
(temperature_delta_v
), as well as the time difference in nanoseconds (delta_t_ns
). These differences are calculated using the lag()
function with a window ordered by time. The outer query then divides the value differences by the time difference and scales them to seconds by multiplying by 1,000,000,000, resulting in the co_rate
and temperature_rate
. The query filters the dataset to include only rows for the sensor TML0100
and within a specified time range, precisely analyzing how quickly these metrics change over time. This is particularly useful for monitoring environmental or operational trends in real-time.
Additional resources
If you enjoyed this blog post you might find the following resources valuable as you get started on your journey with InfluxQL or SQL and InfluxDB v3:
- InfluxCommunity Org: An org with repos containing examples of how to leverage InfluxDB with a variety of tech stacks, demos, InfluxDB v3 Client Libraries, and more
- ArrowFlightClient_Query_Examples: A repo showcasing how to use the Arrow Flight SQL Clients for Java, Go, and C++
- Arrow Flight + RPC Client: Docs for the Arrow Flight Clients
- HTTP API: Docs for the HTTP API
- Query data with SQL: Docs for querying with SQL and InfluxDB v3
- Query data with InfluxQL: Docs for querying with InfluxQL and InfluxDB v3
Wrapping up
I hope this blog post helps you understand the differences between InfluxQL and SQL in InfluxDB v3. As always, get started with InfluxDB v3 Cloud here. In the next post, we’ll cover how to run the project, dive into the architecture and logic, and discuss some of the pros and cons of the selected stack. If you need help, please contact us on our community site or Slack channel. If you are also working on a data processing project with InfluxDB, I’d love to hear from you!