TL;DR InfluxDB Tech Tips - Creating A Subsample With A Continuous Query
By
Regan Kuchan /
Developer
Oct 06, 2016
Navigate to:
In this weekly post we recap the most interesting InfluxDB and TICK-stack related issues, how to create a subsample with a continuous query, workarounds, and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed in the last week or so.
Changing a field's data type
Q: Can I change field’s data type?
A: Currently, InfluxDB offers very limited support for changing a field’s data type.
The <field_key>::<type>
syntax supports casting field values from integers to floats or from floats to integers. See Cast Operations for an example. There is no way to cast a float or integer to a string or boolean (or vice versa).
There are a couple workarounds if the cast operations are too limited for your use case. Note that these workarounds will not update data that have already been written to the database.
Option 1: Write the Data to a Different Field
The simplest workaround is to begin writing the new data type to a different field in the same series.
Option 2: Work the Shard System
Field value types cannot differ within a shard but they can differ across shards.
Users looking to change a field’s data type can use the SHOW SHARDS
query to identify the end_time
of the current shard. InfluxDB will accept writes with a different data type to an existing field if the point has a timestamp that occurs after that end_time
.
Note that this will not change the field’s data type on prior shards. For how this will affect your queries, please see how does InfluxDB handle field type discrepancies across shards.
Interpreting the FOR duration
error
Q: I’m trying to create a Continuous Query and I keep getting an error (see below). It says my FOR
duration must be greater than or equal to my GROUP BY time()
duration - but my FOR
duration is greater than my GROUP BY time()
duration! Any advice?
Continuous Query:
CREATE CONTINUOUS QUERY "max_pumpkins" ON "sales"
RESAMPLE EVERY 3h FOR 2h
BEGIN
SELECT max("pumpkins") INTO "max_pumpkins" FROM "patch" GROUP BY time(1h)
END
Error:
error parsing query: FOR duration must be >= GROUP BY time duration
A: Currently, InfluxDB returns that error whenever the FOR
duration is less than the execution interval . In your case, the execution interval is determined by the EVERY
duration, not the GROUP BY time()
duration (so, technically, the error should say FOR duration must be >= EVERY duration
).
Beyond the misleading error wording, the reason you’re getting the error is because your Continuous Query would miss data between execution times. If max_pumpkins
were to run as is, it would execute a query every three hours (that’s your EVERY
duration). That query would cover the time between now()
and now()
minus two hours (that’s your FOR
duration) - you’d end up not having results for every third hour. To avoid gaps in data coverage, you’ll need to increase your FOR
duration to 3h
.
If this configuration was intentional, there is an open feature request for Continuous Queries to support gaps in data coverage.
Creating a subsample with a Continuous Query
Q: I want to use a Continuous Query to move data with a particular tag from one measurement to another measurement and retention policy. I don’t want to manipulate the data in any way, but it looks like Continuous Queries require an aggregation function. Is there a way around this?
Raw data (where zombies
is a tag):
> SELECT * FROM "mydb"."two_days"."haunted_house"
name: haunted_house
-------------------
time scare_index zombies
2016-10-05T20:00:00Z 3 false
2016-10-05T20:00:00Z 12 true
2016-10-05T20:10:00Z 6 false
2016-10-05T20:10:00Z 13 true
2016-10-05T20:20:00Z 2 false
2016-10-05T20:20:00Z 11 true
What I’d like in a different retention policy/measurement:
> SELECT * FROM "mydb"."seven_days"."zombie_effects"
name: haunted_house
-------------------
time scare_index zombies
2016-10-05T20:00:00Z 12 true
2016-10-05T20:10:00Z 13 true
2016-10-05T20:20:00Z 11 true
A: Continuous Queries do require a function but not necessarily an aggregator function. You could use a selector function to satisfy the function requirement and not change the data in any way.
Assuming your data will always occur at ten-second intervals, the following CQ will give you your subsample:
CREATE CONTINUOUS QUERY "true_zombie" ON "mydb"
RESAMPLE EVERY 20s FOR 20s
BEGIN
SELECT first("scare_index")
INTO "mydb"."seven_days"."zombie_effects"
FROM "mydb"."two_days"."haunted_house"
WHERE "zombies" = 'true'
GROUP BY time(10s),*
END
For more InfluxDB tips, see our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group!
What's next?
- Download and get started with InfluxDB!
- Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project.
- Attend one of our FREE virtual training seminars.
- Got a question and need an immediate answer from the InfluxData Support team? Support subscriptions with unlimited incidents start at just $399 a month. Check out all the support options here.