TL;DR InfluxDB Tech Tips - InfluxDB Kapacitor
By
Regan Kuchan /
Developer
May 26, 2016
Navigate to:
In this post we recap the week’s most interesting InfluxDB Kapacitor and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed.
Preserving tags with INTO queries
Q: I’m using an INTO
query to move data to a new measurement. InfluxDB says it wrote the correct number of points, but when I query the data in the new measurement, I’m missing a point.
In my data, color
is a tag and name
is a field.
> SELECT * FROM "french_bulldogs"
name: french_bulldogs
---------------------
time color name
2016-05-25T00:05:00Z peach nugget
2016-05-25T00:05:00Z grey rumple
2016-05-25T00:10:00Z black prince
> SELECT * INTO "all_dogs" FROM "french_bulldogs"
name: result
------------
time written
1970-01-01T00:00:00Z 3
> SELECT * FROM "all_dogs"
name: all_dogs
--------------
time color name
2016-05-25T00:05:00Z grey rumple
2016-05-25T00:10:00Z black prince
What happened to nugget
?
A: If you use SELECT *
with INTO
, InfluxDB converts any tags in the current measurement to fields in the new measurement. This can cause InfluxDB to overwrite points that were previously differentiated by a tag.
In your case, InfluxDB converts color
from a tag to a field when it writes data to all_dogs
. Because nugget
and rumple
have the same timestamp and are no longer differentiated by a tag, InfluxDB overwrites the nugget
point with the rumple
point.
Use GROUP BY *
to preserve tags as tags with INTO
queries:
> SELECT "name" INTO "all_dogs" FROM "french_bulldogs" GROUP BY *
name: result
------------
time written
1970-01-01T00:00:00Z 3
> SELECT * FROM "all_dogs"
name: all_dogs
--------------
time color name
2016-05-25T00:05:00Z peach nugget
2016-05-25T00:05:00Z grey rumple
2016-05-25T00:10:00Z black prince
To stream or not to stream with Kapacitor
Q: I’m using Kapacitor to join stream data with shifted stream data and I write the joined data back to InfluxDB. Will this work if the shift covers a long time range (for example, 24 hours)? Can Kapacitor handle that amount of data in RAM?
var current_data = stream
| from()
.measurement('sparkle_trees')
var previous_data = stream
| from()
.measurement('sparkle_trees')
| shift(24h)
current_data
|join(previous_data)
.tolerance(10s)
.as('current_data','previous_data')
|influxDBOut()
.database('enchanted_forest')
.retentionPolicy('default')
.measurement('joined_data')
A: Your TICKscript will work but, in practice, it’s not a good idea. First, every time you restart Kapacitor you will have to wait 24 hours before it does any joining. Second, you will have to buffer 24 hours worth of data in RAM; this is impractical unless you have a very small ingestion rate.
We recommend using a batch task instead of a stream task. With a batch task, Kapacitor can simply query old data and start immediately processing the join when Kapacitor first starts up. In addition, Kapacitor only needs to fit data for a single batch in RAM.
For example:
var current_data = batch
| query('SELECT * FROM "enchanted_forest"."default"."sparkle_trees"')
.period(5m)
.every(5m)
var previous_data = batch
| query('SELECT * FROM "enchanted_forest"."default"."sparkle_trees"')
.period(5m)
.every(5m)
// Select data -24h from the current time
.offset(24h)
// Shift data to match current time
| shift(24h)
current_data
|join(previous_data)
.as('current_data','previous_data')
|influxDBOut()
.database('enchanted_forest')
.retentionPolicy('default')
.measurement('joined_data')
When to double quote identifiers in queries
Q: I’ve successfully written data to InfluxDB, but I get an unexpected error when I query my data:
> INSERT monsters exists=true
> SELECT exists FROM monsters
ERR: error parsing query: found EXISTS, expected identifier, string, number, bool at line 1, char 8
A: The field key exists
is an InfluxQL keyword. Identifiers that are also InfluxQL keywords must be double-quoted in queries:
> SELECT "exists" FROM monsters
name: monsters
--------------
time exists
2016-05-25T00:36:01.714188435Z true
2016-05-25T00:36:19.30227374Z true
See our docs for a complete list of InfluxQL keywords.
For more InfluxDB tips, check out our Frequently Encountered Issues page and feel free to post your questions in the InfluxDB users group.
What's next?
- Download and get started with InfluxDB v0.13
- Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project
- Attend one of our FREE virtual training seminars.
- Looking for InfluxDB Clustering on your infrastructure? Contact Sales to get a demo of InfluxDB Enterprise plus pricing information.