Solving Runaway Series Cardinality When Using InfluxDB
By
Rick Spencer /
Product, Use Cases, Developer
Oct 01, 2020
Navigate to:
In this post, you’ll learn what causes high series cardinality in a time series database and how to locate and eliminate the culprits. First, for those of you just encountering this concept, let’s define it: The number of unique database, measurement, tag set, and field key combinations in an InfluxDB instance. Because high series cardinality is a primary driver of high memory usage for many database workloads, it is important to understand what causes it and how to resolve it.
The symptoms of runaway cardinality
People find out that they have a cardinality problem in one of two ways:
- They hit cardinality limits on InfluxDB Cloud.
- They notice that reads and sometimes writes are getting slower and slower on InfluxDB Cloud or InfluxDB OSS 2.0.
When you write to InfluxDB, InfluxDB uses the measurements and tags that you use to create indexes to speed up reads. However, when there are too many indexes created, both writes and reads can actually start to slow down.
Finding the source(s) of runaway cardinality
Indexes are created for each unique combination of tag values and all of the field names. This is described in more detail in the Red Flags of High Cardinality and in Data Layout and Schema Design Best Practices for InfluxDB. Cardinality is more or less calculated in a formula like this:
(count(tag1 values) * count(tag2 values) count(tagn values)) * count(field names)
Note that this formula may overestimate series cardinality in some cases because of the presence of dependent tags, as explained here.
Look for low-hanging fruit with runaway series cardinality
Usually, there is a single tag that is causing the runaway cardinality. Typically, this is a tag that ends up having a unique value for almost every entry.
Some common culprits are:
- Using a log message as a tag value, but it turns out that the log message has a timestamp, pointer value, or other unique string.
- Using a timestamp as a tag. This is typically done by accident in client code.
- Using something that is unique, but just has a very high number of values. For example, if you tag each entry with a user id, that is fine, unless you have hundreds of thousands or millions of users.
Digging in to suspicious tags with InfluxDB
If no tag is immediately suspicious, then you simply need to find the values of the cardinality equation in order to track down the source of your runaway cardinality. You may need to do this for multiple buckets if you are not certain which bucket is causing the runaway cardinality.
Finding the field names is easy. You can use this query:
import "influxdata/influxdb/v1"
v1.fieldKeys(bucket: "idping", start: -100y)
|> count()
Normally, though, tag values are the culprit. Here is a query that will count the number of values for each tag in a bucket:
import "influxdata/influxdb/v1"
cardinalityByTag = (bucket) =>
v1.tagKeys(bucket: bucket)
|> map(fn: (r) => ({
tag: r._value,
_value: if contains(set: ["_stop","_start"], value:r._value) then
0
else
(v1.tagValues(bucket: bucket, tag: r._value)
|> count()
|> findRecord(fn: (key) => true, idx: 0))._value
}))
|> group(columns:["tag"])
|> sum()
cardinalityByTag(bucket: "my-bucket")
The above query will show you which tags are contributing the most to cardinality, and there is a good chance that one will be orders of magnitude higher than the others. However, if you are experiencing runaway cardinality, this query may time-out trying to complete the calculation. If you experience a time-out, you can complete the following steps one at a time.
First, you can generate a list of tags with this query:
import "influxdata/influxdb/v1"
v1.tagKeys(bucket: "my-bucket")
|> yield(name: "tags")
Then, for each tag, find how many values are associated with the tag with a query like this:
import "influxdata/influxdb/v1"
v1.tagValues(bucket: "my-bucket", tag: "my-tag")
|> count()
These queries should give you the numbers you need to identify the sources of cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.
Fixing your schema
Usually, solving your runaway cardinality is as simple as changing an offending tag to a field. If you are worried about how this will impact query performance, solutions for that are discussed below.
Deleting data to reduce high cardinality
Sometimes you don’t actually need the data that is causing high cardinality. In these cases, you can use the delete API endpoint to delete some data, or simply delete the whole bucket. Note that it can take up to a few hours for your account to reflect the decreased cardinality.
Designing schema or read performance
Generally, the value of using a tag is to create indexes, and the value of creating indexes is that during a query, the query engine does not need to scan every single record in the database. However, as described above, having too many indexes creates its own performance problems. So, the trick is to create a middle ground between scanning and indexing.
For example, let’s say you want to often query for specific user ids, but you have hundreds of thousands of users. A simple query like this:
from(bucket: "my-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.userId == "abcde")
Where userId is a field, InfluxDB will cause a scan of every row in storage for the userId. However, if you include a tag that can be reasonably indexed in your schema, you can reduce the number of rows scanned considerably. For example, let’s say you have hundreds of thousands of users, but each user can be categorized by type, location, company, etc. You can include a tag that has many fewer values so it can be reasonably indexed. As a result, this query:
from(bucket: "my-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.companyTag == "Acme")
|> filter(fn: (r) => r.userId == "abcde")
Could execute much faster, because “companyTag” is indexed, so that set of rows can be quickly retrieved, and then the scan for userId can scan many fewer rows.
Hopefully this article has helped you solve your cardinality problems and got your InfluxDB project back on track. Once you understand the contributors to cardinality, and the relationship to your queries, you can design your system to work faster.