Client Library Deep Dive: Python (Part 2)
By
Jay Clifford /
Product, Developer
Jul 28, 2023
Navigate to:
Working with the new InfluxDB 3.0 Python CLI and Client Library
Okay, we are back for Part 2! Last time we discussed the new community Python library for InfluxDB 3.0. If you missed it, you can also watch it in video form.
Now for Part 2, let’s talk about a bolt-on application that uses the client library as the core of its development, the InfluxDB 3.0 Python CLI.
Python CLI
Okay, so following the same format as before, what were the reasons for building the CLI? Well, there are two primary reasons:
-
We wanted to give users a data browsing tool that leveraged the new Flight endpoint. Python gave us the opportunity to prototype fast before we invested work in a more robust CLI offering. It also allowed us to leverage some interesting data manipulation libraries that could extend the scope of the Python CLI.
-
We wanted a robust way to test the newly created InfluxDB 3.0 Python Client library, as you will see most of the tooling and functionality in use.
Install
Let’s talk about the installation process because, I must admit, Python doesn’t provide the most user-friendly packaging and deployment methods unless you use it daily. I recommend installing the CLI in a Python Virtual Environment first for test purposes:
$ python3 -m venv ./.venv
$ source .venv/bin/activate
$ pip install –upgrade pip
$ pip install influxdb3-python-cli
This set of commands creates our Virtual Python Environment, activates it, updates our Python package installer, and finally installs the new CLI.
If you would like to graduate from a Python Virtual Environment and move the CLI to your path, you can do so with a sudo install (You have to be careful here not to cause permission issues with packages):
sudo python3 -m pip install influxdb3-python-cli
Creating a CLI config
The first thing you want to do is create a connection config. This feature acts like the current InfluxDB influx CLI by saving your connection credentials for InfluxDB to use later.
influx3 create config \
--name="poke-dex" \
--database="pokemon-codex" \
--host="us-east-1-1.aws.cloud2.influxdata.com" \
--token="<your token>" \
--org="<your org ID>"
–name | Name to describe your connection config. This must be unique. |
–token | This provides authentication for the client to read and write from InfluxDB Cloud Serverless or Dedicated. Note: you need a token with read and write authentication if you wish to use both features. |
–host | InfluxDB host — this should only be the domain without the protocol (https://) |
–org | Cloud Serverless still requires a user’s organization ID for writing data to 3.0. Dedicated users can just use an arbitrary string. |
–database | The database you wish to query from and write to. |
Config commands
Config commands also exist to activate, update, delete, and list current active configs:
|
The update subcommand updates an existing configuration. The --name parameter is required to specify which configuration to update. All other parameters (--host, --token, --database, --org, --active) are optional. |
|
The use subcommand sets a specific configuration as the active one. The --name parameter is required to specify which configuration to use. |
|
The delete subcommand deletes a configuration. The --name parameter is required to specify which configuration to delete. |
|
The list subcommand lists all the configurations. |
Writing and querying
You can use the CLI to either directly call the application, followed by the commands you wish to run, or run it through an interactive REPL. I personally believe the REPL approach provides a better flow, so let’s demo some of the features.
Once you created your config you simply enter the following to activate the REPL:
influx3
Which leads to:
influx3
InfluxDB 3.0 CLI.
(>)
Query
Let’s first take a look at the query options. Within the REPL you have 3 query options: SQL, InfluxQL, and chatGPT (more on this later). Let’s drop into the SQL REPL and run a basic query against the Trainer data we generated in the previous blog:
InfluxDB 3.0 CLI.
(>) sql
(sql >) SELECT * FROM caught
Now I wouldn’t normally recommend querying without some form of time-based WHERE clause, but I wanted to highlight how the CLI can handle large datasets. It uses mode = chunk
from the Python Client Library to break large datasets into manageable Arrow batches. From there we have three options.
-
We can either hit TAB to see the next portion of data, if one exists.
-
Press F to save the current Arrow batch to a file type of our choosing (JSON, CSV, Parquet, ORC, Feather).
-
Press CTRL-C to return back to the SQL REPL.
Let’s take a look at the option 2:
| 3961 | 82 | Venusaur | 83 | 80 | 0003 | 12 | 7 | 80 | 2023-07-06 13:41:36.588000 | ash | Grass | Poison |
| 3962 | 64 | Dratini | 45 | 41 | 0147 | 6 | 7 | 50 | 2023-07-06 14:30:32.519000 | jessie | Dragon | |
Press TAB to fetch next chunk of data, or F to save current chunk to a file
Enter the file name with full path (e.g. /home/user/sample.json): ~/Desktop/all-trainer-data.csv
Data saved to ~/Desktop/all-trainer-data.csv.
Here is a sample of the CSV file created:
"attack","caught","defense","hp","id","level","num","speed","time","trainer","type1","type2"
49,"Bulbasaur",49,45,"0001",12,"1",45,2023-07-06 14:30:41.886000000,"ash","Grass","Poison"
62,"Ivysaur",63,60,"0002",7,"1",60,2023-07-06 14:30:32.519000000,"ash","Grass","Poison"
62,"Ivysaur",63,60,"0002",8,"1",60,2023-07-06 14:30:38.519000000,"ash","Grass","Poison"
Once we reach the end of our dataset, it prompts us to press ENTER to drop back into the SQL REPL. Just remember if you feel like you’re pressing TAB forever, you can always drop out of the query with CTRL-C.
Now, let’s look at a more interesting example with the InfluxQL REPL:
(sql >) exit
(>) influxql
(influxql >) SELECT count(caught) FROM caught WHERE time > now() - 2d GROUP BY trainer
| | iox::measurement | time | trainer | count |
|---:|:-------------------|:--------------------|:----------|--------:|
| 0 | caught | 1970-01-01 00:00:00 | ash | 625 |
| 1 | caught | 1970-01-01 00:00:00 | brock | 673 |
| 2 | caught | 1970-01-01 00:00:00 | gary | 645 |
| 3 | caught | 1970-01-01 00:00:00 | james | 664 |
| 4 | caught | 1970-01-01 00:00:00 | jessie | 663 |
| 5 | caught | 1970-01-01 00:00:00 | misty | 693 |
(influxql >) SELECT count(caught) FROM caught WHERE time > now() - 2d GROUP BY time(1d),trainer ORDER BY time
| | iox::measurement | time | trainer | count |
|---:|:-------------------|:--------------------|:----------|--------:|
| 0 | caught | 2023-07-05 00:00:00 | ash | nan |
| 1 | caught | 2023-07-06 00:00:00 | ash | 625 |
| 2 | caught | 2023-07-07 00:00:00 | ash | 148 |
| 3 | caught | 2023-07-05 00:00:00 | brock | nan |
| 4 | caught | 2023-07-06 00:00:00 | brock | 673 |
| 5 | caught | 2023-07-07 00:00:00 | brock | 180 |
| 6 | caught | 2023-07-05 00:00:00 | gary | nan |
| 7 | caught | 2023-07-06 00:00:00 | gary | 645 |
| 8 | caught | 2023-07-07 00:00:00 | gary | 155 |
| 9 | caught | 2023-07-05 00:00:00 | james | nan |
| 10 | caught | 2023-07-06 00:00:00 | james | 664 |
| 11 | caught | 2023-07-07 00:00:00 | james | 157 |
| 12 | caught | 2023-07-05 00:00:00 | jessie | nan |
| 13 | caught | 2023-07-06 00:00:00 | jessie | 663 |
| 14 | caught | 2023-07-07 00:00:00 | jessie | 144 |
| 15 | caught | 2023-07-05 00:00:00 | misty | nan |
| 16 | caught | 2023-07-06 00:00:00 | misty | 693 |
| 17 | caught | 2023-07-07 00:00:00 | misty | 178 |
We will save this one as a Parquet file for later.
Write
Moving on from using the CLI for querying, let’s talk about the write functionality. Now, this feature set isn’t as fleshed out as I would like it to be but it covers the basics. We can drop into the write REPL and write data to InfluxDB using line protocol like so:
(influxql >) exit
(>) write
(write >) caught,id=0115,num=1,trainer=brock attack=125i,caught="KangaskhanMega Kangaskhan",defense=100i,hp=105i,level=13i,speed=100i,type1="Normal" 1688741473083000000
Next let’s have a look at the write_file
feature. For this we need to drop out of the REPL entirely and use flag commands when calling ‘influx3’. Let’s load our count results into a new table:
(write >) exit
(>) exit
Exiting …
influx3 write_file --help
usage: influx3 write_file [-h] --file FILE [--measurement MEASUREMENT] --time TIME [--tags TAGS]
options:
-h, --help show this help message and exit
--file FILE the file to import
--measurement MEASUREMENT
Define the name of the measurement
--time TIME Define the name of the time column within the file
--tags TAGS (optional) array of column names which are tags. Format should be: tag1,tag2
influx3 write_file --file ~/Desktop/count.parquet --time time --tags trainer --measurement summary
Here is the result:
(influxql >) SELECT count, trainer, time FROM summary
| | iox::measurement | time | count | trainer |
|---:|:-------------------|:--------------------|--------:|:----------|
| 0 | summary | 2023-07-05 00:00:00 | nan | ash |
| 1 | summary | 2023-07-05 00:00:00 | nan | brock |
| 2 | summary | 2023-07-05 00:00:00 | nan | gary |
| 3 | summary | 2023-07-05 00:00:00 | nan | james |
| 4 | summary | 2023-07-05 00:00:00 | nan | jessie |
| 5 | summary | 2023-07-05 00:00:00 | nan | misty |
| 6 | summary | 2023-07-06 00:00:00 | 625 | ash |
| 7 | summary | 2023-07-06 00:00:00 | 673 | brock |
| 8 | summary | 2023-07-06 00:00:00 | 645 | gary |
| 9 | summary | 2023-07-06 00:00:00 | 664 | james |
| 10 | summary | 2023-07-06 00:00:00 | 663 | jessie |
| 11 | summary | 2023-07-06 00:00:00 | 693 | misty |
| 12 | summary | 2023-07-07 00:00:00 | 148 | ash |
| 13 | summary | 2023-07-07 00:00:00 | 180 | brock |
| 14 | summary | 2023-07-07 00:00:00 | 155 | gary |
| 15 | summary | 2023-07-07 00:00:00 | 157 | james |
| 16 | summary | 2023-07-07 00:00:00 | 144 | jessie |
| 17 | summary | 2023-07-07 00:00:00 | 178 | misty |
Experimental feature (ChatGPT)
So with chatGPT and OpenAI being all the rage these days, I looked to see if their Python package could benefit the CLI. Interestingly it does… Because InfluxDB has been open source since its inception, chatGPT has become pretty well-versed in building InfluxQL queries. Take a look at this example:
(chatgpt >) give me a list of the top 10 caught with an attack higher than 100 from caught
Run InfluxQL query: SELECT * FROM caught WHERE attack > 100 LIMIT 10
| | iox::measurement | time | attack | caught | defense | hp | id | level | num | speed | trainer | type1 | type2 |
|---:|:-------------------|:---------------------------|---------:|:--------------------------|----------:|-----:|-----:|--------:|------:|--------:|:----------|:---------|:--------|
| 0 | caught | 2023-07-06 13:09:36.095000 | 110 | Dodrio | 70 | 60 | 0085 | 19 | 1 | 100 | jessie | Normal | Flying |
| 1 | caught | 2023-07-06 13:09:36.095000 | 125 | Pinsir | 100 | 65 | 0127 | 6 | 1 | 85 | brock | Bug | |
| 2 | caught | 2023-07-06 13:10:53.995000 | 130 | CharizardMega Charizard X | 111 | 78 | 0006 | 6 | 1 | 100 | brock | Fire | Dragon |
| 3 | caught | 2023-07-06 13:10:53.995000 | 150 | BeedrillMega Beedrill | 40 | 65 | 0015 | 12 | 1 | 145 | jessie | Bug | Poison |
| 4 | caught | 2023-07-06 13:10:53.995000 | 102 | Nidoking | 77 | 81 | 0034 | 20 | 1 | 85 | gary | Poison | Ground |
| 5 | caught | 2023-07-06 13:10:53.995000 | 105 | Primeape | 60 | 65 | 0057 | 16 | 1 | 95 | misty | Fighting | |
| 6 | caught | 2023-07-06 13:10:53.995000 | 120 | Golem | 130 | 80 | 0076 | 8 | 1 | 45 | ash | Rock | Ground |
| 7 | caught | 2023-07-06 13:10:53.995000 | 105 | Muk | 75 | 105 | 0089 | 5 | 1 | 50 | brock | Poison | |
| 8 | caught | 2023-07-06 13:10:53.995000 | 105 | Muk | 75 | 105 | 0089 | 19 | 1 | 50 | james | Poison | |
| 9 | caught | 2023-07-06 13:10:53.995000 | 105 | Muk | 75 | 105 | 0089 | 16 | 2 | 50 | james | Poison | |
This feature currently only uses ChatGPT 3.5 and requires an OpenAPI token. If you would like instructions on how to use this feature, check out this part of the README.
Future hopes
The future is bright for the Python CLI as our development team pushes forward with tooling for InfluxDB 3.0. For now, the scope is to keep it as a bolt-on tool for Python developers and those who want an easily extendable CLI. Here is my current laundry list for the project:
Feature | Status |
---|---|
Improve OpenAI functionality:
|
TO DO |
Find a better way to package and distribute. Currently looking into Pyinstaller as an option. | TO DO |
Extended write functionality. | TO DO |
Provide post query exploration support (Pandas functions) | TO DO |
Integrate delta sharing | TO DO |
Wrapping up
So there you have it, Part 2 done and dusted. I really enjoyed writing this blog series on both the Python Client Library and CLI. Having such a heavy hand in the inspection makes writing about them far more exciting and easy. I hope these blogs inspire you to join our new community-based libraries and tooling. If you want to chat about how to get involved, you can reach me via Slack or Discourse.