TL;DR InfluxDB Tech Tips - Querying Tag Value, Data Across Measurements, Understanding Timestamps
By
Regan Kuchan /
Developer
Jun 16, 2016
Navigate to:
In this post we recap the week’s most interesting InfluxDB and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed.
Working with Data Across Measurements
Q: I have two measurements: cow_groups
and cow_activity
. I want to GROUP BY
a tag in cow_groups
and calculate the average of a field in cow_activity
.
This is how my data are organized:
> SELECT * FROM cow_groups
name: cow_groups
----------------
time cow_group_id cow_id value
2016-06-12T23:10:00Z consistent producer 3 1
2016-06-12T23:10:00Z food motivated 2 1
2016-06-12T23:10:00Z food motivated 1 1
2016-06-12T23:10:00Z cool kids 1 1
> SELECT * FROM cow_activity
name: cow_activity
------------------
time cow_id grass_consump milk_prod
2016-06-12T23:10:00Z 1 10 60
2016-06-12T23:10:00Z 3 9 70
2016-06-12T23:10:00Z 2 7 71
I know InfluxDB doesn’t do JOIN
s but is there a way to perform something like the following query?
SELECT mean("milk_prod") FROM "cow_activity" GROUP BY "cow_groups"."cow_group_id"
A: You are correct. InfluxDB 0.9+ doesn’t support JOIN
s. Instead, all series in a given measurement are automatically merged unless distinguished by explicit filters in the WHERE
clause or the GROUP BY
clause.
There is no way to perform cross-measurement math or grouping. All data must be under a single measurement to query it together. We recommend re-organizing your schema so that all of your data live in one measurement. InfluxDB is not a relational database and mapping data across measurements is not a great schema.
Recommended schema (cow_group_id
and cow_id
are tags):
> SELECT * FROM "cows"
name: cows
----------
time cow_group_id cow_id grass_consump milk_prod
2016-06-12T23:10:00Z consistent producer 3 9 70
2016-06-12T23:10:00Z food motivated 2 7 71
2016-06-12T23:10:00Z food motivated 1 10 60
2016-06-12T23:10:00Z cool kids 1 10 60
# That schema makes this query possible:
> SELECT mean("milk_prod") FROM "cows" GROUP BY "cow_group_id"
name: cows
tags: cow_group_id=consistent producer
time mean
---- ----
1970-01-01T00:00:00Z 70
name: cows
tags: cow_group_id=cool kids
time mean
---- ----
1970-01-01T00:00:00Z 60
name: cows
tags: cow_group_id=food motivated
time mean
---- ----
1970-01-01T00:00:00Z 65.5
Querying Tag Values
Q: I successfully wrote several points to the database and a simple SELECT *
query works, but when I specify a tag in the WHERE
clause I get no results. What am I missing?
> INSERT farm,patch_id=1 blueberries=15 1465948800000000000
> INSERT farm,patch_id=2 blueberries=32 1465948800000000000
> INSERT farm,patch_id=1 blueberries=12 1465952400000000000
> SELECT * FROM "farm"
name: farm
----------
time blueberries patch_id
2016-06-15T00:00:00Z 15 1
2016-06-15T00:00:00Z 32 2
2016-06-15T01:00:00Z 12 1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"=1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"="1"
>
A: Use single quotes to specify a tag value in the WHERE
clause:
> SELECT * FROM "farm" WHERE "patch_id"='1'
name: farm
----------
time blueberries patch_id
2016-06-15T00:00:00Z 15 1
2016-06-15T01:00:00Z 12 1
Understanding Timestamps Returned by GROUP BY time() queries
Q: My query has a 20 minute GROUP BY time()
interval and specifies a 20 minute time range in the WHERE
clause. Interestingly, I get results for two time intervals instead of just one. What’s going on?
> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m)
name: my_kitchen
----------------
time max
2016-06-14T15:20:00Z 20
2016-06-14T15:40:00Z 18
A: You’re getting two results because of the way InfluxDB handles time boundaries with GROUP BY time()
queries.
By default, InfluxDB returns rounded calendar time boundaries. In your case InfluxDB automatically groups together 2016-06-14T15:20:00Z
and 2016-06-14T15:30:00Z
as the first 20 minute interval and 2016-06-14T15:40:00Z
and 2016-06-14T15:50:00Z
as the next 20 minute interval. Note that the results in the max
column only consider data that fall within the time range in the WHERE
clause.
You can alter the default rounded calendar time boundaries by including an offset interval in your GROUP BY time()
clause. The following query tells InfluxDB to add ten minutes to the default time boundaries so that the results begin at 2016-06-14T15:30:00Z
instead of 2016-06-14T15:20:00Z
:
> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m,10m)
name: my_kitchen
----------------
time max
2016-06-14T15:30:00Z 20
Check out Data Exploration for documentation on the offset interval.
For more InfluxDB tips, check out our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group.
What's next?
- Download and get started with InfluxDB v1.0 Beta
- Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project
- Attend one of our FREE virtual training seminars.
- Looking for InfluxDB Clustering on your infrastructure? Contact Sales to get a demo of InfluxDB Enterprise plus pricing information.