Building a Telegraf Assistant - UC Berkeley Codebase
By
Community /
Use Cases, Product, Developer
Jan 27, 2021
Navigate to:
This article was written by Codebase, a UC Berkeley student organization.
Hello InfluxData community! We are a team from Codebase, a UC Berkeley student organization that builds software projects for high-growth tech companies. This past semester, the eight of us had the incredible opportunity to work with InfluxData to add cloud-controlled configuration management features to Telegraf. Our goal was to take the hassle out of setting up config files, allowing users to directly edit their configurations through an easy-to-use web dashboard rather than editing the raw TOML files.
Telegraf Assistant in a nutshell
Right now, Telegraf starts and stops a user’s selected plugins all at once. Additionally, any changes made to a plugin’s settings require the user to shut down Telegraf, change the configuration file, then restart the service. To remedy this, the Telegraf Assistant is a tool that allows Telegraf users to modify plugin configurations through HTTP operations. The Assistant is instantiated alongside the Telegraf Agent, and provides the functionality to start, stop, update, or retrieve information about a plugin without stopping the Telegraf service. When started, a plugin is assigned a unique identifier. All plugins, after instantiation, must be referenced by their assigned identifiers through this API.
Plugin API
This implementation of Telegraf recognizes plugins by id, not name. When users run this version, the Assistant will assign IDs to the plugins automatically when reading in the user’s configuration file on startup, to accommodate users having multiple instances of the same plugin.
Each request is a CRUD operation. The comprehensive list of operations available includes: starting a plugin, stopping a plugin, updating a plugin, getting plugin details, getting plugin schema, retrieving all available plugins, and retrieving all running plugins. Examples of request payloads and responses for each of these operations can be currently found in assistant/README.md in the Codebase fork of the Telegraf repo on GitHub.
Here is an example of updating a plugin using the Plugin API. The uuid
refers to the generated id of the operation request.
// REQUEST PAYLOAD
{
"operation": "UPDATE_PLUGIN",
"plugin": {
"id": "x14t-54...",
"type": "input",
"config": "<changed values struct here>"
},
"uuid": "213894y123..."
}
// RESPONSE
{
"status": "success",
"data": {
"plugin": {
"id": "x14t-54...",
"name": "cpu",
"config": "<updated config struct here>"
}
},
"uuid": "213894y123..."
}
Any changes made to any plugins, including new plugins enabled or disabled, will be saved to the user’s configuration file, so that the same settings are retained the next time that Telegraf is started.
What we've learned
This semester, we were able to experience a real remote working environment, learning to use tools such as VSCode Liveshare, Zoom, Discord, GitHub and more, to effectively communicate both within our own developer team and with our InfluxData points of contact. We were also exposed to some new technologies such as using CircleCI to verify our code and to ensure that the new features we added weren’t breaking any of the existing functionalities. Moreover, we learned to pay attention to the specific requirements of both Linux and Windows systems. To ensure all users of Telegraf can have the same experience, we needed to refactor our work for Windows compatibility as well. Lastly, the developers on our team learned so much about the intricacies of Golang-specific data structures and packages. Since there are only one or two classes on campus that primarily use Golang, this was a great opportunity for us to learn more about the language and its advantages.
What's next for the Telegraf Assistant?
Our Telegraf Assistant source code and information can all be found in our fork, codebase-berkeley/telegraf, of the Telegraf GitHub branch. The InfluxData Telegraf team will be reviewing our code and making any changes for it to work in production and eventually merge it into the Telegraf product. A major next step they will take is to get this to work in large enterprise environments. Tim, the VP of Products, mentioned that many of InfluxData’s customers operate tens of thousands of Telegraf agents, and getting the Telegraf Assistant to work at that scale will be the next challenge for the InfluxData engineering team.
If you’re interested in collaborating in the Telegraf Assistant project, check out our code and all the details about this new feature! Please make any comments on our PR, and the InfluxData team takes any feedback when implementing the final version of the Telegraf Assistant.
Thank you InfluxData team!
We want to take this chance to thank the very amazing people who made this project and collaboration possible! Ryan, thank you for being the first person to introduce Codebase to InfluxData this past summer, and Barbara, thank you for directing us to the Telegraf team and the awesome projects that you’re all working on. Samantha, Dave, and Jess, huge thank you for taking the time to meet with us and hold standup every single week, and for being reliable points of contact who helped us with any logistical or technical problems super quickly. We also appreciate the fun memories such as inviting Samantha and a few engineers to speak about their experiences working at InfluxData for our club members, joking with Jess about Survivor at meetings, and hearing from Dave that we got a shoutout at InfluxDays North America 2020.