TL;DR InfluxDB Tech Tips - All About the New Subqueries!
By
Regan Kuchan /
Product, Developer
Jan 26, 2017
Navigate to:
In this post we introduce subqueries; a new InfluxQL feature in InfluxDB version 1.2. Check in next week for more on the most interesting InfluxDB and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group.
What is a subquery?
Subqueries are a new feature in InfluxDB version 1.2. They increase the functionality of your InfluxQL queries and allow you to gain more granular, meaningful insights into your data.
A subquery is a query that’s nested in the FROM
clause of another query. You can think of a subquery as a query that serves as a condition in a separate query. The syntax is pretty straightforward:
SELECT_clause FROM (SELECT_statement) [...]
where the subquery is the SELECT_statement
that appears inside parentheses in the FROM
clause of the main query, and the main query is everything outside of those parentheses. Both the subquery and the main query support every InfluxQL clause. When the system executes a query, it runs the subquery and then the main query on the results of that subquery.
When would I use a subquery?
Perform functions on functions on functions
Execute a function on the results of another function. This use case is equivalent to nested functions. The following query calculates the average value of passengers
for every subway
tag and returns only the maximum average value:
> SELECT MAX("mean") FROM (SELECT MEAN("passengers") FROM "schedule" GROUP BY "subway") name: schedule time max ---- --- 1970-01-01T00:00:00Z 32.75
Notice that the SELECT
clause in the main query specifies mean
as the field key in the MAX()
function. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:
> SELECT MEAN("passengers") FROM "schedule" GROUP BY "subway"
name: schedule
tags: subway=U2
time mean <---- The relevant field key for the main query
---- ----
1970-01-01T00:00:00Z 32.75 <---- This is the max value that gets returned
name: schedule
tags: subway=U6
time mean <---- The relevant field key for the main query
---- ----
1970-01-01T00:00:00Z 18
Perform additional analysis on the results of another query
Another use for subqueries is to execute a function on the results of a mathematical operation. This use case is equivalent to performing mathematical operations within a function. The following query calculates the number of spilled_coffee
s per passenger
and returns the average of those quotients:
> SELECT MEAN("spills_per_person") FROM (SELECT "spilled_coffee"/"passengers" AS "spills_per_person" FROM "schedule" GROUP BY "subway") name: schedule time mean ---- ---- 1970-01-01T00:00:00Z 0.13206144512134227
Notice that the SELECT
clause in the main query specifies spills_per_person
as the field key in the MEAN
function. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:
> SELECT "spilled_coffee"/"passengers" AS "spills_per_person" FROM "schedule" GROUP BY "subway" name: schedule tags: subway=U2 time spills_per_person <---- The relevant field key for the main query ---- ----------------- 2017-01-25T18:00:00Z 0.029411764705882353 2017-01-25T18:05:00Z 0.05555555555555555 2017-01-25T18:10:00Z 0.13333333333333333 2017-01-25T18:15:00Z 0.16129032258064516 name: schedule tags: subway=U6 time spills_per_person <---- The relevant field key for the main query ---- ----------------- 2017-01-25T18:00:00Z 0.2 2017-01-25T18:05:00Z 0.1111111111111111 2017-01-25T18:10:00Z 0.05 2017-01-25T18:15:00Z 0.3157894736842105
Place specific conditions on the results of another query
Execute a function and return only those results that meet a specific condition. This use case is similar to SQL’s HAVING
clauses. The following query calculates the minimum number of passengers
at ten-minute intervals and returns only those minimum values that are greater than 15:
> SELECT "min" FROM (SELECT MIN("passengers") FROM "schedule" WHERE time >= '2017-01-25T18:00:00Z' AND time <= '2017-01-25T18:15:00Z' GROUP BY time(10m)) WHERE "min" > 15 name: schedule time min ---- --- 2017-01-25T18:10:00Z 19
Notice that the SELECT
and WHERE
clauses in the main query specify min
as the field key of interest. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:
> SELECT MIN("passengers") FROM "schedule" WHERE time >= '2017-01-25T18:00:00Z' AND time <= '2017-01-25T18:15:00Z' GROUP BY time(10m) name: schedule time min <---- The relevant field key for the main query ---- --- 2017-01-25T18:00:00Z 15 2017-01-25T18:10:00Z 19 <---- This is the only minimum value that gets returned
What else is there?
You can find the documentation for subqueries on the Data Exploration page. This is our first implementation of subqueries in InfluxQL - feel free to open an issue on GitHub if you find any unexpected behavior. We’d love to get your feedback!
What's next:
- Downloads for the TICK-stack are live on our "downloads" page.
- Deploy on the Cloud: Get started with a FREE trial of InfluxDB Cloud featuring fully-managed clusters, Kapacitor and Grafana.
- Deploy on Your Servers: Want to run InfluxDB clusters on your servers? Try a FREE 14-day trial of InfluxDB Enterprise featuring an intuitive UI for deploying, monitoring and rebalancing clusters, plus managing backups and restores.
- Tell Your Story: Over 300 companies have shared their story on how InfluxDB is helping them succeed. Submit your testimonial and get a limited edition hoodie as a thank you.