How to Write Points from CSV to InfluxDB v2 and InfluxDB Cloud
By
Anais Dotis-Georgiou /
Product, Use Cases, Developer
Jul 29, 2021
Navigate to:
There are several ways to write CSV to InfluxDB v2 and InfluxDB Cloud, including:
- The Telegraf File Plugin
- The CLI
- The csv.from() Flux function
- Any client library
I will demonstrate how to write CSV to an InfluxDB Cloud Free Tier account with these methods because signing up for a Free Tier account is the fastest way to get started with InfluxDB Cloud. However, the process is identical for InfluxDB v2 OSS. For each one of those write methods in the list above, we will write the following CSV to InfluxDB Cloud:
"
_time,region,host,usage_user,_measurement
2018-05-08T20:50:00Z,east,A,15.43,cpu
2018-05-08T20:50:20Z,east,A,59.25,cpu
2018-05-08T20:50:40Z,east,A,52.62,cpu
"
Using the Telegraf File Plugin to write CSV to InfluxDB Cloud
Telegraf is InfluxData’s plugin-driven server agent for collecting and reporting metrics. There are over 200 input plugins, which means there are many ways to get data into InfluxDB. However, I frequently see new Influx users inquiring about how to write points from CSV to InfluxDB on the InfluxData Community Site. Writing points from a CSV file is an easy way to insert familiar data into InfluxDB, which can make it easier to get acquainted with the platform.
First, I need to download a Telegraf config file with the appropriate input and output Plugins. As per the Getting Started with Telegraf documentation, I will use the following command in the terminal in the directory of my choosing.
telegraf -sample-config -input-filter file -output-filter influxdb_v2 > writeCSV.conf
- The
-sample-config
flag will generate the telegraf config file. - The
-input-filter
and-output-filter
flags specify the input and output sources of the data, respectively. - The text following the
>
names the config file.
After running the command, I open writeCSV.conf. My telegraf config is complete with the File Input Plugin and the InfluxDB v2 Output Plugin.
Change the input portion of your telegraf config to look like this:
[[inputs.file]]
## Files to parse each interval. Accept standard unix glob matching rules,
## as well as ** to match recursive files and directories.
files = ["/path/to/writeCSV.csv"]
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
## Data format to consume.
data_format = "csv"
## Indicates how many rows to treat as a header. By default, the parser assumes
## there is no header and will parse the first row as data. If set to anything more
## than 1, column names will be concatenated with the name listed in the next header row.
## If `csv_column_names` is specified, the column names in header will be overridden.
csv_header_row_count = 1
## Indicates the number of rows to skip before looking for header information.
csv_skip_rows = 1
## Indicates the number of columns to skip before looking for data to parse.
## These columns will be skipped in the header as well.
csv_skip_columns = 1
## The character reserved for marking a row as a comment row
## Commented rows are skipped and not parsed
csv_comment = "#"
## The column to extract the name of the metric from
csv_measurement_column = "_measurement"
## Columns listed here will be added as tags. Any other columns
## will be added as fields.
csv_tag_columns = ["host","region"]
## The column to extract time information for the metric
## `csv_timestamp_format` must be specified if this is used
csv_timestamp_column = "_time"
## The format of time data extracted from `csv_timestamp_column`
## this must be specified if `csv_timestamp_column` is specified
csv_timestamp_format = "2006-01-02T15:04:05Z07:00"
You’ll need to gather the following account information to configure the output portion of your telegraf config:
- Bucket. A bucket is where data is stored. You can think of it as a database. All buckets have retention policies. You assign a bucket a retention policy upon bucket creation. The easiest way to create a bucket is through the InfluxDB Cloud user interface (UI), as described here. However, you can also use the CLI or API to create a bucket. This bucket will be referred to as our destination bucket, "mybucket".
- Token: An InfluxDB authentication token ensures security for users and their data. Tokens belong to organizations and describe permissions. You'll need to create a read/write or all-access token scoped to your destination bucket. The easiest way to create a bucket is through the InfluxDB Cloud UI, as described here. However, you can also use the CLI or API to create a token.
- Organization: Your organization will be the email you used to set up your InfluxDB Cloud account.
Change the output portion of your telegraf config to look like this:
# Configuration for sending metrics to InfluxDB 2.0
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["http://127.0.0.1:8086"] if your using OSS
urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
## Token for authentication.
token = "mytoken"
## Organization is the name of the organization you wish to write to.
organization = "[email protected]"
## Destination bucket to write into.
bucket = "mybucket"
To test that your telegraf configuration is working as expected, use the --test
flag. This command outputs the metrics in line protocol, the input format for InfluxDB, to stdout. It’s often worth running this command before to verify that your Telegraf configuration and resulting line protocol are correct before writing data to InfluxDB.
telegraf --config writeCSV.conf --test
If you’re new to Telegraf, it’s worth being aware of the debug
option in the telegraf config under the agent portion of your config. This option is set to false by default:
debug = false
Set this value to true
to output detailed logs about telegraf. This option is extremely valuable for debugging telegraf.
Finally, run telegraf and pass in the path to your config to write the CSV data to your InfluxDB Cloud account:
telegraf --config $pwd/writeCSV.conf
Using the CLI to write CSV to InfluxDB Cloud
First you must download the CLI in order to write a CSV to InfluxDB Cloud with the CLI. Use the following command to write a CSV to your InfluxDB destination bucket:
influx write -b mybucket\
-t mytoken
-o [email protected]
-f path/to/writeCSV.csv \
--header "#datatype dateTime:2006-01-02T15:04:05Z07:00,tag,tag,double,measurement"
Supply the header flag to provide annotations or metadata about your CSV. Inject annotation headers in order of column appearance from your CSV. We identify the timestamp column and format, the tag columns, the field columns by their respective data type (double, float, int, etc.), and finally the measurement column.
To avoid having to provide your authentication credentials each time you use the InfluxDB CLI, I encourage you to create a CLI configuration. Creating and using a CLI config is the recommended way to interact with the InfluxDB CLI. This approach is described in detail in TL;DR InfluxDB Tech Tips – How to Use the CLI Locally to Access the Flux REPL and Write a Regular CSV to InfluxDB Cloud.
Finally, the dryrun command writes your data in line protocol to stdout. It’s often worth running that command to ensure that your headers and resulting line protocol are correct before writing to InfluxDB. In the example below, I have already set up an InfluxDB CLI config so I don’t need to pass in my org or token. The dryrun command outputs my line protocol and helps me verify my influx write will be successful:
influx write dryrun -b mybucket -f ~/path/to/writeCSV.csv --header "#datatype dateTime:2006-01-02T15:04:05Z07:00,tag,tag,double,measurement"
cpu,host=A,region=east usage_user=15.43 1525812600000000000
cpu,host=A,region=east usage_user=59.25 1525812620000000000
cpu,host=A,region=east usage_user=52.62 1525812640000000000
Using the csv.from() Flux function to write CSV to InfluxDB Cloud
The data scripting language for InfluxDB is called Flux. Flux enables you to…
- Query your data from multiple sources
- Analyze and transform your data
- Write custom functions and scripts
- Execute those scripts on a schedule or create tasks
- Monitor data and send alerts
- ...and more!
Today, we’ll use the csv.from() function to convert a CSV to annotated CSV, the InfluxDB output format, in InfluxDB Cloud. Then we’ll apply some additional Flux functions to format the output so that we can visualize it in the InfluxDB UI. Finally, we’ll use the to() function to actually write the data to our destination bucket.
import "csv"
csvData = "
_time,region,host,usage_user,_measurement
2018-05-08T20:50:00Z,east,A,15.43,cpu
2018-05-08T20:50:20Z,east,A,59.25,cpu
2018-05-08T20:50:40Z,east,A,52.62,cpu
"
csv.from(csv: csvData, mode: "raw")
|> rename(columns: {usage_user: "_value"})
|> map(fn: (r) => ({ r with _value: float(v: r._value) }))
|> map(fn: (r) => ({ r with _time: time(v: r._time) }))
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> to(bucket: "mybucket")
First, you must import the Flux CSV package. There are two ways you can pass CSV data into the csv.from() function:
- Store your CSV data in a variable.
- Specify the path to your CSV. Please note that the CSV must exist in the same file system running the fluxd process. In other words, this approach doesn't work in the InfluxDB UI for InfluxDB Cloud.
It’s also worth being aware of a second csv.from() function from the Flux Experimental Package, which allows you to write annotated CSV from a remote url.
We store our raw CSV data in a variable called csvData and use the first csv.from() function to convert our raw CSV to annotated CSV. To learn more about annotated CSV, the output format of Flux queries, please read TL;DR InfluxDB Tech Tips – How to Interpret an Annotated CSV.
The above screenshot shows the annotated CSV output from the csv.from(csv: csvData, mode: "raw")
line. Data is viewed by toggling on the raw data view in the InfluxDB UI. The csv.from() function converts data to a string type, as indicated by datatype annotation row.
- Then we use the rename() function to rename the usage_user column to
"_value"
. This column is renamed because the InfluxDB UI visualizes columns with the name"_value"
by default. - The csv.from() function will convert our data to strings by default.
- The map() function applies the time() function to each record in the
"_time"
column. - The time() function converts the string value to a time.
Using the Python client to write CSV to InfluxDB Cloud
The InfluxDB Python Client supports Pandas and pandas dataframes which I believe is the easiest way to write the CSV to InfluxDB. Your code would look like this:
import pandas as pd
from influxdb_client import InfluxDBClient
client = InfluxDBClient(url="https://us-west-2-1.aws.cloud2.influxdata.com", token="mytoken", org="myorg")
write_client = client.write_api()
df = pd.read_csv("~/Desktop/writeCSV.csv")
df['_time'] = pd.to_datetime(df['_time'], format="%Y-%m-%dT%H:%M:%SZ")
df.set_index(['_time'])
write_client.write("mybucket", "[email protected]", record=df, data_frame_measurement_name="_measurement",
data_frame_tag_columns=['region', 'host'])
After installing pandas and installing the InfluxDB v2 Python Client, import the libraries. Then instantiate the InfluxDB Client by providing the URL of your InfluxDB instance, your token, and your org. Use the to_datetime() function to set the timestamp column of your dataframe to a datetime object. Set the datetime column as the index of your dataframe with the set_index() function. Use the write method to write data to InfluxDB. Specify the destination bucket you wish to write your dataframe to and pass to the record
parameter. Also specify your dataframe tag columns and measurement column.
Conclusions about writing CSV to InfluxDB v2 Cloud and OSS
I hope this InfluxDB blog post helps you to get started using InfluxDB and writing CSV to InfluxDB. If you are using the new InfluxDB and need help, please ask for some in our community site or Slack channel. If you’re developing a cool IoT application or monitoring your application on top of InfluxDB, we’d love to hear about it, so make sure to share your story! Additionally, please share your thoughts, concerns, or questions in the comments section. We’d love to get your feedback and help you with any problems you run into!
Further reading
While this post aims to provide a comprehensive overview of how to write CSV to InfluxDB, the following resources might also interest you:
- TL;DR InfluxDB Tech Tips – How to Use the CLI Locally to Access the Flux REPL and Write a Regular CSV to InfluxDB Cloud: This post shares how to use the InfluxDB CLI with InfluxDB Cloud to write a CSV to InfluxDB.
- TL;DR InfluxDB Tech Tips – How to Interpret an Annotated CSV: This post describes how to interpret an Annotated CSV, the Flux query result format for InfluxDB.
- Getting Started with Python and InfluxDB v2.0: This post describes how to get started with the Python Client Library.
- Getting Started with InfluxDB and Pandas: This post shares how to get started with Pandas to write dataframes to InfluxDB and return dataframes with a Flux query.
- Top 5 Hurdles for Flux Beginners and Resources for Learning to Use Flux: This post describes common hurdles for Flux beginners and how to tackle them by using the InfluxDB UI, understanding Annotated CSV, and more.
- Top 5 Hurdles for Intermediate Flux Users and Resources for Optimizing Flux: This post describes common hurdles for intermediate and advanced Flux users while providing more detail on pushdown patterns, how the Flux engine works, and more.