InfluxDB and AWS – 100x More Powerful – Live Demonstration
Session date: Dec 14, 2023 08:00am (Pacific Time)
Together, InfluxData and AWS simplify the developer experience for time series data by enabling easier discovery, registration, and deployment of InfluxDB Cloud within AWS.
InfluxDB is specifically designed to handle time series data at scale and is up to 100x more powerful than non-specialized time series databases. With features such as optimized ingest performance, faster queries, and infinitely scalable storage persistence, InfluxDB for AWS is a highly efficient and cost-effective data storage solution.
In this demo, you will learn how customers use InfluxDB and AWS to power their infrastructure and applications and dive deeper into its technical aspects and capabilities. Discover how InfluxDB and AWS enable you to:
- Ingest and query data at a massive scale to power real-time analytics applications.
- Ingest billions of data points in real-time with unbounded cardinality.
- Easily connect and integrate with your favorite AWS tools like CloudWatch, EC2, ECS and Fargate, EKS, Kinesis, Lambda, and RDS.
Watch the Webinar
Watch the webinar “InfluxDB and AWS – 100x More Powerful” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “InfluxDB and AWS – 100x More Powerful.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Anais Dotis-Georgiou: Developer Advocate, InfluxData
- Ryan Harber: Sales Engineer, InfluxData
Anais Dotis-Georgiou: 00:04
Hello, everybody, and welcome. I’m Anais Dotis-Georgiou, and I’m a developer advocate at InfluxData. And today, we have joining us, Ryan Harber. I’ll introduce him in just a second, but I want to go over some housekeeping before we begin. So, this webinar is recorded, and it will be shared with you and available on demand in the next 24 hours. And you’ll also get the slides as well. And if you have any questions, I really would appreciate if you put them in the Q&A at the bottom of your screen. And we will answer all these questions at the end of this webinar. And please don’t forget to check out the InfluxDB community Slack workspace and the community forums. Tons of Influxers and other community members are answering each other’s questions, and any questions that you have about any of the material today or Influx in general, we can answer and help you there. So today, I’m extremely excited to introduce one of our customer-facing solution engineers. His name is Ryan Harber. And Ryan, the floor is yours. So, feel free to tell us a bit about yourself and what you’ve been working on.
Ryan Harber: 01:05
Awesome. Thank you, Anais. Yeah. Good morning, everyone. Nice to meet you all. My name’s Ryan Harber. I’m a sales engineer here with Influx. So, I work, as Anais said, on the customer-facing side alongside of our sales folks, but mostly on the technical front. So, these days, I’ve primarily been dealing with our latest version of the software, version 3.0, which I’m going to be speaking today a bit about. I’m currently based in Seattle, so it’s a bit dark. Apologies for that. But yeah. Looking forward to getting in the presentation today. And I will also have a quick demo for you guys to actually show you the platform. So, I think with that, I’m going to go ahead and kick off the presentation. Anais, do we want to give people a minute or two to join here, or should we kick it off?
Anais Dotis-Georgiou: 01:56
I think we can go ahead and get started.
Ryan Harber: 02:00
Sounds good. All right, everyone. So good morning to you all or afternoon, wherever you’re calling in from. Today, I’m going to be talking to you guys about InfluxDB, specifically version 3.0, a bit about our integration with AWS. And I’ll also be giving a brief demo of the platform and how you can leverage it with AWS services. So as a brief overview of what we’re going to cover today, we’ll talk about what InfluxDB is for those not already familiar. We will discuss our latest version of the platform, which is called 3.0. We’ll talk about integration with AWS. We will look at a real-world customer use case. And then I will give a brief demo of the 3.0 platform as well as integration with AWS. And today, we will look specifically at AWS Kinesis for the example I’ll show you. And then we’ll leave the remaining time for any Q&A, and Anais and I can help you there as best we can. All right. So first off, what is InfluxDB? So InfluxDB is a purpose-built database and platform for handling time series data at massive scale. The platform is comprised of four main components. So first off, we have the actual data collectors, which are designed to allow you to collect data so you can ingest it into InfluxDB.
Ryan Harber: 03:36
The first kind of primary one of those is called Telegraph, which is an open-source server-based agent that we designed for collecting and sending data from a variety of products, devices, and services. Telegraph is written in Go and compiles into a single binary with no external dependencies, and it requires a very minimal footprint. So that’s actually what we’re going to be using in the example today. The second section of data collectors is our catalog of client libraries. And we have a client library for all of the major programming languages. So, for 3.0, those currently include Go, Python, JavaScript, Java, and C#. The next section of the platform is our scripting languages. So, with 3.0, we currently have two scripting languages compatible. One of those is InfluxQL, which is a SQL-like query language that we designed at Influx to be optimized for use with time series data. The other, which is a new improvement of the 3.0 platform, is you can now use SQL right out of the box. The next section of the platform is our API. So, we have a single RESTful API, which enables developers to work across clouds, environments, and even different versions of Influx. In addition to accessing and querying the data using the API, our SaaS customers also use the API to accomplish administrative tasks. So, you can do things like create accounts and buckets.
Ryan Harber: 05:06
So, the last section of the platform, obviously, is we have the actual database itself. And so that’s what we’re going to spend most of our time in the next few slides discussing. As a brief primer, it’s available in three flavors virtually. So, you have our cloud serverless option, which is a multi-tenant, the dedicated option, which is single tenant, and then the clustered option, which is the on-premise offering of Influx. All right. Now we will discuss 3.0 specifically, which is the new database and storage engine that we launched in 2023. And it forms the core of our new platform. So first off, why do we build a 3.0 engine and what needs does it fulfill? V3 now offers unlimited cardinality and can be used as a single store for metrics, events, and traces, which we’ll get into a bit more later on. V3 improves real-time data analysis and action and is designed to deliver subsecond queries over massive sets of time series data. V3 offers long-term, low-cost data storage by storing all data as Apache Parquet files and object storage. More on this later as well. V3 now offers full SQL support, as I mentioned right out of the box, in addition to continued compatibility with InfluxQL. And lastly, V3 offers the flexibility to integrate and extend time series insights across open standards and data ecosystems.
Ryan Harber: 06:37
So, in this slide, we have an overview of our open architecture. Under the hood, we are leveraging Data Fusion and V3 for query engine optimizations. So, for those not familiar, Data Fusion is an in-memory query engine developed in Rust that utilizes Apache Arrow for its columnar format. We’ve adopted a couple elements of the Apache ecosystem, namely Apache Arrow for optimization of in-memory columnar data storage, as well as the Apache Parquet file format for extremely efficient storage and retrieval of time series data. The data persistence in Parquet format combined with object storage means you can actually leverage zero copy delta sharing or running things like machine learning tasks and other data science activities. It’s a lot of benefits you get from Parquet. All right. In this slide, as you can see, 3.0 is now an entirely Kubernetes-orchestrated software platform, and all of the different components of the database engine are decoupled from one another. So, this provides the ability to specifically scale each component both horizontally and vertically. For example, you can scale the ingest tier separately from the query tier and vice versa, depending on your particular use case. Ultimately, this enables users to more efficiently consume available resources within their environment.
Ryan Harber: 08:08
All right, here we have a common problem we often see with customers where they are using multiple copies of data to manage multiple different tools within their business. So, for example, they might use separate high-cost storage for a data lake in addition to machine learning tools, a visualization app, and maybe also long-term storage. The solution here is to use InfluxDB as a single data store for all of these tools. Multiple tools can access data persisted in Parquet using zero copy and zero-ETL. This overall decreases complexity within the business as well as increasing efficiency in building data pipelines. And it also decreases total cost of ownership when you can use a single store for these different tools. All right. And then here’s kind of a related— another common problem we see with customers where they’re using three separate data stores, each for their different metrics, events, and traces. So, kind of a different store for each different type of time series data they have within the business. Now, again, here, Influx V3 is a great solution to this problem. So V3 offers unlimited cardinality, and it can support all time series data types, allowing you to store metrics, events, and traces all in the same place. Again, this simplifies data pipelines and can drastically reduce total cost of ownership.
Ryan Harber: 09:41
All right, now we’ll talk a bit about integration with AWS. So, we host two different fully-managed cloud offerings on AWS. Our serverless environment is ideal for small and growing workloads. It has no required upfront provisioning. It leverages multi-tenant shared infrastructure, and it has usage-based pricing. Our cloud dedicated environment is ideal for enterprise workloads, and it offers a fully managed single-tenant database platform hosted in AWS with enterprise-grade security and private networking available. A small note on the dedicated piece: it will be available on Azure and GCP, hopefully, some time in 2024 as well. You can integrate with and ingest data from a variety of AWS services and tools. There are a few examples listed in the slide here. And today we’re actually going to take a look at AWS Kinesis integration specifically. And so more on that in a couple of slides here. All right, now I want to touch on an actual real-world customer use case briefly. So, we have a leader in the gaming industry that utilizes InfluxDB to collect and analyze data from a vast network of 32,000 machines distributed all over the world to track and analyze key metrics such as CPU usage, memory utilization, and connected player counts. They have over 500 megabytes a minute of data flowing into InfluxDB. And in general, this allows for real-time monitoring and management of server performance. And Influx also helps ensure efficient operation and optimal gaming experiences within the business.
Ryan Harber: 11:28
The integration of visualization tools like Grafana and custom dashboards provides game developers and publishers with immediate real-time insights into their game’s performance on the platform. This company also recently moved to Influx 3.0, and they gained the following results. They now have less than 300 milliseconds response time for queries against the last 30 minutes of data. So again, V3 offers very high performance on leading edge queries. They had a 50X performance improvement across the board. They had significant cost improvements. And lastly, they are able to offer longer retention now of high-fidelity data to inform internal and game studio decisions. And this longer retention was primarily enabled by 3.0’s extremely low-cost object storage on Parquet file format. All right. So now we’re going to jump into an actual demo of the platform, and then we’ll leave the remaining time for questions at the end. So as a quick primer, the goal of this demo today is going to be to send data into a Kinesis stream in AWS. For those not familiar, Kinesis is a fully managed real-time data streaming service offered by AWS and hosted there. And then once we have the data in the Kinesis stream, we’re going to use the Telegraph Kinesis integration to pull the data out of Kinesis and then actually write it into InfluxDB. And then we’ll query the data from our serverless environment just to make sure that it’s there.
Ryan Harber: 13:07
All right. I’m going to go ahead and share my other screen here. Okay. So, there’s a couple of pieces of this demo here. Primarily, we have the Kinesis stream, the Telegraph configuration, and then also where we’re actually hosting InfluxDB. So, we’re using the serverless multi-tenant environment to host InfluxDB. We’re going to run Telegraph on my local machine, but we’re going to use a Telegraph configuration hosted in our serverless environment. So, I’ll touch more on that later. And then for the Kinesis stream, we’re going to leverage a tool right here called the Kinesis Data Generator to actually generate data and send it into the Kinesis stream. So basically, this is synthetic data, right? So, we can control what the data looks like and how often we generate it. So real quick as an overview of the data that we’re going to send into the Kinesis stream to start things out here. Basically, we’re using a template here, which is built on top of Faker.js, I believe, in the Kinesis Data Generator over here. And so, this is the general format of the data that’s going to get uploaded into the Kinesis stream, which will eventually get pulled down by Telegraph and then eventually get sent over to InfluxDB.
Ryan Harber: 14:32
So, everything that goes into Influx eventually gets crushed into something called line protocol, which is a combination of four primary things. You have the measurement name here. You have the tag set here, which here we only have one tag, which is status. You have the field set, which here we only have one field, which is data. And then you ultimately have a timestamp with every single point that goes into Influx. So, we run a test template here. You can see what the actual line protocol is going to look like. So, we’re going to send these raw points into the Kinesis stream and then pull them off with Telegraph. So, I’m going to go ahead and start sending data. I already have the Kinesis stream in us-west-2 that I’m using here linked up to the data generator. So, all of this data is going to go directly into that stream. So, we can see right here, we’re sending a single record per second into the Kinesis stream. So, all of that data is actually landing there now. So now we just have to configure Telegraph to pull the data out, send it to Influx.
Ryan Harber: 15:35
All right. So now I’m looking at the InfluxDB serverless environment, the multi-tenant front-facing web UI here. And right now, we are looking at the Telegraph section of the UI. So, if you go over here to basically start loading data, which is kind of the first piece of using Influx, there’s a section here where you can go to Telegraph. And you can actually use the Influx web UI here to help you generate a configuration, right? So, for example, you could go in here to create configuration, and you can see a huge list of all the plugins that you have available in Telegraph. So, these are essentially right out of the box, require usually just a couple lines of code to integrate with any of these services. So, a lot of these are AWS services, and so that really helps you leverage connection with AWS products. There’s one in here for Kinesis specifically right here. So, if we click on Kinesis, it’ll actually spin up a kind of a basic template configuration for integrating with a Kinesis stream. So, I already have this configuration created, but I’ll show you a quick overview of what that looks like.
Ryan Harber: 16:47
So, in general, all Telegraph agents are going to run on a Telegraph configuration. And there’s a couple main parts. You have the agent configuration, which actually specifies how the agent itself will do collection, batching, configure things like the integral, so how often you’re going to send data. So that is configured up here. We’re going to be collecting data every 10 seconds. And then the other important parts of the Telegraph configuration are two things, the input or the inputs and the outputs. You can add a variety of inputs and outputs. There’s no limit, right?
And you can use any of those services that Telegraph integrates with. Today, we have two particular outputs. So, we’re going to be outputting the STD out, which is just going to allow us to see— in the terminal instance, we use the data that’s going to be sent to Influx. So, if we didn’t have that on, you wouldn’t actually see it in terminal. It would just go straight to Influx. This next plugin here is how we’re going to send data from Telegraph to Influx. So, Telegraph is going to collect these points, batch them up, and then it’s going to send it to the Influx instance that we have specified here. So, what we’re using today is this cloud instance of Influx that we’re actually looking at here in the web UI. So, this is the URL here. I have the Influx token already configured in my terminal environment. So, you want to make sure to do that for authentication. And then you have to specify the organization and the bucket that you want to write to. And then from that point on, the line protocol that you write in will specify how the points actually get written.
Ryan Harber: 18:26
And then lastly, here, we have a section for the actual input that we’re using today, which is for the Kinesis consumer plugin, right? So, we have the region specified, and then we have the Kinesis access key and secret key stored in the terminal environment that we’ll be using. So, there’s a couple different ways or several ways that you can essentially authenticate with Amazon Kinesis. And those are listed here in kind of the template configuration. We’re going to just be using the access key and the secret key today for making it straightforward. You can use any of these to hook up to Kinesis. And then you just need the stream name and you need to specify the shard iterator type. More on the shard iterator type if you look that up with Kinesis. Latest is just pulling the latest data off of the stream. So now that we have this basic configuration for Telegraph setup, we are going to go over to a terminal instance on my machine where we’re going to actually run Telegraph, but then we’re going to run Telegraph with this configuration that’s stored in cloud. And so, the way to do that here is if you click setup instructions, you’ll see kind of an overview of the steps required.
Ryan Harber: 19:41
So, the main one is to get your Influx token exported to your environment. So that was the environment variable that you saw in the Telegraph configuration. Basically, anything prepended by a dollar sign is going to be interpreted as an environment variable. So, I already have this exported, but you’ll want to do that if you create your own Telegraph configuration here. And then the last step here is just to actually run Telegraph and then specify with this configuration flag, the configuration we want to use. So, you can run Telegraph out of the box, and then it will look for a Telegraph configuration on your machine. But if you specify the flag, then it will go ahead and go grab this and use that. So, what we’re going to do is copy this to the clipboard.
I’m going to jump over to terminal, and I’ll get that shared. All right. So, we can see here in terminal. Basically, now all we have to do, since I’ve already exported all the environment variables into the terminal environment, we just have to paste this. So, what this is going to do, again, is going to run Telegraph, specify the configuration, and we’re going to actually use this link here, which is using a cloud-hosted Telegraph configuration. And again, we already have data being sent into the Kinesis stream actively, right? So that’s just being basically queued up in the Kinesis stream. So as soon as we run this Telegraph, it should start going and hooking up, authenticating the Kinesis stream, and then pulling that data out and sending it to Influx.
Ryan Harber: 21:13
All right. So, I went ahead and start the Telegraph configuration. Again, the interval is on 10 seconds here. So, we’ll give that a couple of seconds to load. And then we have an actual STD out plugin enabled as well. And so that should actually show us the data coming directly.
Ryan Harber: 21:43
Okay. Great. So, this looks like that everything’s working. Basically, what this is showing is the STD output of the telegraph configuration that we designed. And this is showing what those points actually look like. They’re going to be sent into Influx. So, the other output that we have is to— an Influx instance, and in particular, one that we have stored in cloud. So, everything looks good here. What this is signifying is that we have data being pulled off of the Kinesis stream with Telegraph and then sent in to InfluxDB in the cloud. And we’re also seeing it outputted to STD out here in the terminal. So now I’m going to switch back over to the cloud instance of Influx, the serverless environment. And we’re going to actually go ahead and query the data, make sure we have it there, right? And I’ll speak a little bit about SQL and the new integration with Influx 3.
Ryan Harber: 22:37
So as a quick start to query the data, you can go to the data explorer here in serverless. And we just need to do a couple things, right? We need to specify the bucket where we’re looking for data. And then we need to specify the measurement where we’re pulling data out of. So, if you recall from the Kinesis data generator here, the actual data that we’re sending in is in the Kinesis measurement, which is specified in the first section of the line protocol that we’re sending in. So, we can see that matches up. And then we specified the bucket and the Telegraph configuration. So, we have those two things specified here. And when you change these, if you turn SQL sync on, it will actually help you generate the SQL statement that you need to query that type of data out. So right now, we have actually a really basic query, but I’ll make this even more basic here. And we’ll just select essentially all the columns from the Kinesis table where the time is essentially anything from now— greater than anything from now to an hour back so the past hour, essentially. And you can specify that down here. So, when we run this, what essentially we’re going to see is all that data that we’ve been ingesting from Kinesis with Telegraph and then ingesting into influx from Telegraph. So that’s all these points here, right? So, this is just going to start increasing because we’re sending a single record per second right now. So, you can see this to start buildup.
Ryan Harber: 24:11
All right. And now for kind of just a couple kind of quick queries that we can discuss here to make things a little more interesting. Basically, you can use SQL out of the box now with version three. So, anything that you can do with SQL, you can leverage with InfluxDB. So, we can do things like count the number of occurrences for each particular status tag. So, in this example, the points that we’re ingesting have one of three different tag values for this particular tag. They have fail, warn, or okay. All right. So, let’s say that we want to count the occurrences for each of those particular tag values over the past hour. We would do something like this. We would select the status tag from all the points, and then we would count them. And then we would select that from the kinesis measurement from the last hour, and then we would group all of that data by the status tag. So, what that’s going to do is it’s going to separate out all that data by each of the values of the status tag. So, if we run this, what you can see here is essentially a count broken down for each of those particular status tag values. And so, there’s three now, but if you were to write in another status tag value, for example, critical, then you would have another section here. So, a fourth section with the count for that section as well, right? So, this is dynamic. Again, everything that you write into InfluxDB is schema on read. So, you can change the points that are going in, and then the schema will reflect that down the line. All right. Well, that’s pretty much it for the demo. I do want to leave a couple minutes for Q&A here at the end. So, I think I will just open that up to the floor. Everyone, feel free to put questions in the chat as well, or jump on the mic and go ahead and ask away. Thanks, everyone, for your time today.
Anais Dotis-Georgiou: 26:13
Thank you, Ryan. So, we do have some Q&A that we answered, but I want to go over it so that everyone can benefit from the questions that are asked. So, one question we have is, how does InfluxDB 3.0 differ from InfluxDB 3 OSS? Essentially, 3.0 powers InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated and will power the two OSS offerings that will come out later. And that’s InfluxDB Community and InfluxDB Edge. And I included a link, and you guys will have access to that link once this recording is over as well. And then we also have another question. Are there any supported tools reporting data stored in Influx [CP1.x?] Enterprise 23.0. And there are. There’s some migration tooling available as a part of the CLI. And so, I shared links to documentation for that as well. And someone else asked if we will get a copy of this recorded video or webinar later, and you absolutely will. So, anything that you feel like you missed or you want to review, you’ll have time to take a look at that later. And then we have another question in the chat, which is, can you do ANC standard joins to a lookup table? You can do joins. I can’t remember specifically to a lookup table. Ryan, do you know?
Ryan Harber: 27:49
I’m not sure on that one.
Anais Dotis-Georgiou: 28:13
Yeah. I’m not entirely sure about that. We do use the Postgres implementation of SQL. So, if that is supported there, then it should be. I do know that we have, I believe, different types of joins available. I’ll now go ahead and share the documentation for that.
Anais Dotis-Georgiou: 28:59
All right. I don’t quite see any more questions coming in, so I think it’s a good time to go ahead and close. But if you do have any additional questions, please reach out to us on the Influx Slack or our forums as well. Thank you so much, everyone, for attending. So, Ryan will be sending out an email tomorrow with the slides and this recording. And yes. Please reach out to us in our Slack channel and InfluxDB community. Thank you so much, Ryan, for your time, and thank you, everyone, for joining.
Ryan Harber: 29:31
Yeah. Thank you, guys. Happy holidays, everyone.
Anais Dotis-Georgiou: 29:34
Happy holidays. Bye.
Ryan Harber: 29:36
Cheers.
[/et_pb_toggle]