Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogStream
Session date: Mar 16, 2021 08:00am (Pacific Time)
Many organizations agree that migrating workloads to the cloud or to a newer version of existing tooling can result in cost savings and flexibility. A well-designed observability pipeline is often the key to a quick and painless transition, leading to positive impacts on cost optimization, data visibility, and performance. Cribl’s LogStream product helps teams implement such an observability pipeline.
In this hands-on technical discussion, the audience will learn how to leverage Cribl LogStream to successfully upgrade from InfluxDB 1.x to InfluxDB 2.x or move to InfluxDB Cloud. Join us as we walk through the pros and cons of workload migration, share architecture best practices, and give a live demo on how to combine Cribl LogStream with the latest version of InfluxDB.
Watch the Webinar
Watch the webinar “Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogStream” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogStream”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
-
-
- Caitlin Croft: Customer Marketing Manager, InfluxData
- Steve Litras: Director of Technical Marketing, Cribl
-
Caitlin Croft: 00:00:04.795 Hello, everyone. Welcome again to today’s webinar. I am super excited to have Steve from Cribl joining us to talk about how to use their product to help to make upgrading your InfluxDB even easier. My name is Caitlin Croft. I work here at InfluxData. Please feel free to post any questions you may have in the chat and Q&A. Without further ado, I’m going to hand things off to Steve.
Steve Litras: 00:00:33.843 All right. Thanks, Caitlin. Hi, everybody. I won’t say good morning or good afternoon because it’s probably all over the board here. So hi, everybody. I’m going to talk to you today about using our product to help you with your InfluxDB upgrade. There we go. So a little bit about me, first. I say 20-plus years, but it’s really more years than I care to admit, in IT. I’ve run infrastructure teams. I’ve run architecture teams. I’ve run applications teams. I’ve been both an individual contributor in middle and upper-management. I was an early adopter of Splunk. And back in 2006, there really wasn’t many of these kind of tools around. And I was managing a lab for about 300 developers and enterprise app configurators and having all sorts of problems with that lab. Came across an ad, installed Splunk, fed my logs to it, and I’d start coming in every morning and typing Error. And suddenly, I went from the guy who was getting beat up, because of all the problems, to the rock star who was solving all the problems before anybody saw them. That really convinced me of the power of observability data and how we could use it to make our operations better.
Steve Litras: 00:02:00.218 Currently, I’m the director of technical marketing in Cribl. I’ve been here a year, a year and some change, and really happy to be here talking to you guys today. So let’s face it, upgrading core systems is not easy. You run into all sorts of problems. And I don’t think it’s any different with a monitoring system versus something like a finance system or literally any application that you’re dealing with, some sort of transactional capability. So first, you need to understand how the new version is going to run differently. Do you need different resources? Do you need more or less disk space? Do you need to deploy in the cloud? What do you need to make this thing work? And it’s especially hard if you can’t replicate production conditions. So we’ve all evaluated software in a test lab, come away with it very happy. And then once we put it into production, we discover all sorts of things that we didn’t want to. So it’s important that you try and replicate your production environment. You’ve also got to decide on your approach for the upgrade. Sure, you could do an upgrade in place. But if something happens, if a catastrophic failure happens, what’s your escape plan? If you’re like me, you’ve been through the case where you have to go back to backups. Even taken from a nearline storage, a backup takes a while to restore. And more than likely, you’re going to blow through any window you had in that upgrade, that planned maintenance time. You could also use snapshots, but even the best snapshots have their own challenges that you need to work around.
Steve Litras: 00:03:45.301 Now, if you’re not doing an upgrade, if you’re building a new environment, you need to think about data migration. Actually, you need to think about it either way because, even in an upgrade, you’re going to have to think about steps like, potentially database upgrades or tasks around the upgrade. But if you’re doing it as not an upgrade, as building a new instance, you need to think about how you’re going to get your old data to that system. This is the part where, if we were in person, I’d ask for a show of hands of who recognizes this reference. It’s a reference to the Apollo, the US landing a man on the moon or people on the moon in 1969. A 1202 error was something that the lunar lander hit on the way down. Now, those guys had run simulations for months, probably hundreds, if not thousands of times. They have simulated that landing to the nth degree. But problems still happen.
Steve Litras: 00:04:44.616 And that really just shows that the test lab and real life are not the same. Companies like InfluxData or Cribl, we try and put out bug-free products. But the reality is that every environment’s different. There are variables in every environment that are maybe not present in other environments, and bugs happen. So you need to be able to recover from a problem happening in production. Now, what if we could do this? What if we can make it easier? What if we could make it so you could iterate over that upgrade in an easy fashion? Right. Do it once. If things don’t work, strike the whole thing, and do it again. Do it until you’re comfortable with it. What if instead of doing an upgrade in place or doing a switchover where data stops going to the original system, what if we could, effectively, make a copy of all that data and send it to both the new and the old system so that you had the ability to either go back when you needed to, or you could continue to do this upgrade over and over until you’re comfortable going back to the iteration portion or iteration discussion. We can do that. And if you’re in that kind of model where you’re sending stuff to both instances, now, your outage window really just becomes cutover and, more than likely. It’s a DNS change. So your maintenance window becomes however long it takes for your DNS to propagate.
Steve Litras: 00:06:25.943 And finally, what if we could give you the ability to replay data, old data, back into the system at any given point in time. This gives you the ability to do things like, “I want to do my data migration before or after I go live.” That’s an option if I can replay that data. So guess what, all this and more is available to you. This is Cribl LogStream. Cribl LogStream is an observability pipeline in a box. It sits between your observability sources like Kafka, Telegraph, Fluentd, Beats, any of those systems and your destinations, like all the ones on the right you see here, including, of course, InfluxDB. It can handle routing of data. So it can receive any of the data from these inputs. It can route that data to any of these outputs. It can transform the data [inaudible] for the destination. So if you have two different destinations that really require the data to have a different shape, we can do that on the fly. We can do that at the destination layer so that you’re getting the same data to each system, but it’s optimized for that system.
Steve Litras: 00:07:43.787 And on top of that, we have this ability to do replay. So in replay, you can actually - with replay, you can actually separate your system of retention from your system of analysis. Now, this is probably less impactful with something like a Metric Store or a Time Series database because you’re not storing all these raw logs. But in a lot of environments, you have a Splunk or elastic system that is taking all the logs and storing them for retention, at the same time using infrastructure that’s optimized for search. So it’s a really expensive way to retain your data. But if you could play all that stuff to an inexpensive storage location like an S3 or a Glacier and be able to replay it at any point in time, that retention requirement becomes a whole lot easier. And in most cases, it’s going to save you a fair amount of money as well. So reimagining an upgrade, we spin up an influence - sorry. We have this environment where we have a number of sources on the left, Splunk, Telegraph, whatever. They’re feeding data through Cribl LogStream which has a routing table and pipelines. A routing table decides where things go, and pipelines make any changes to that data. So we’re sending data through this environment. We’ve got two pipelines that are feeding two different destinations. We’ve got one that’s feeding InfluxDB 1, and one that’s feeding Amazon S3. And that’s our [inaudible] pipeline. So now, when we want to do an upgrade, we can very easily just introduce the second instance of InfluxDB. In the case of what we’re demoing, it’s InfluxDB 2 on-prem if you will. But there’s no reason that’s not Influx Cloud. As far as we’re concerned, it’s the same thing.
Steve Litras: 00:09:40.291 Now, we have the same data going to both systems. We have this concept of a post-processing pipeline, and that’s - I mentioned that we can modify data as it’s going to the destination to make it optimized for that destination. That is a post-processing pipeline. A pipeline is really just a series of functions. We have a number of functions out of the box. And you can also develop your own, potentially, that will modify data, do aggregations on data, all sorts of things. So once we’re happy, now, because we have all this data going to both places, hey, guess what? You’re already upgraded. Right. We have a second system taking exactly the same data, sharing the format of the data so that if I look at either one, they look the same, except I don’t have all of my old data. So that’s where replay comes in. And we use a collector, which we’ll talk about, to get that data back from S3 and rerun it through the system, so we can go fill data back from where we cutover back to how much of our data we have. And then finally, once we’re happy with the systems, we can run in parallel as long as we want. Once we’re ready, we just destroy the old first version. And boom, we’re running in our end state. We’ve got everything running to influx DB 2. We’re still archiving data in the pipeline - or archiving data to our S3 bucket, and everything’s working as we expected.
Steve Litras: 00:11:26.090 So let’s talk a little bit about the demo environment or the environment that I built to show you how this stuff works. We have, in essence, a Cribl LogStream. We are a distributed system, although we can work as a non-distributed system. But we generally are a distributed system where we have a master node that manages all configuration, and then we have a set of worker groups. And worker groups are really just compute that’s bound by the same configuration. All data can go through any instance of a worker group and get treated the same way. On the left, we have a source configured for a data generation, and that’s really just feeding Apache logs. It’s generating and feeding Apache logs into the system. On the right, the very first output is our InfluxDB 1 version. And as you can see with the arrows, it’s actually receiving data already. Then on the bottom, we have an S3 archive bucket, which is also receiving data. There are two things in here that I pre-configured because I don’t trust myself to type live. It’s not that they’re complex to configure. And I’ll show you both of them, so you can see that. It’s that I didn’t want to make a mistake cutting and pasting. So we have the influx to DB2 instance. And I have Grafana sitting on top of both of these just to show the queries because that’s a pretty common pattern out there. But whatever tool is sitting on top of it, the same caveats apply. And then on the left, this little guy over here is a collector. A collector is a configuration we use for going back to, in this case, the archive bucket. Our collectors can read S3. They can read file systems. They can also use scripts. So you can write a script to do collection. And we also have a REST API collector available, so you can use it to go get anything that’s available via REST API. We do that for a lot of enrichment use cases.
Steve Litras: 00:13:25.420 Now, to talk about dataflow, right now, we have the data generator, which comes in through our routing table. And you’ll notice that there’s a color difference between the dataflow and the first pipeline and the second pipeline. So that first pipeline makes a copy of the data, and the route determines this by deciding whether it’s a final route or not a final route. And a final route means that the original data will go through that route. A not-final route means that a copy of the data will go through that route. So as the data comes in and hits that first route, it gets copied and sent through the pass-through pipeline, which is really a no op pipeline. It doesn’t really do anything to the data. It just feeds it to the archive bucket. The original data continues down the routing table to the next pipeline, which logs the metrics. And that route knows to send the final version of that data, the original version, through that, logs the metrics as a set of functions that extracts the data from those logs, identifies all the numeric data in there, and then builds aggregates on it. So things like byte-count, time taken, all those kind of things end up getting turned into metrics that we’re then feeding to Influx DB as metrics directly.
Steve Litras: 00:14:52.453 All right. Let’s go ahead and jump over to the demo. I’m going to stop sharing here and jump over to my browser. All right. I’ll hit Reload in case my login timed out. So what you’re seeing here is the Cribl LogStream UI. As I mentioned, we are set up in a distributed mode. So we have worker groups. We have one worker group in this case, the DC logs worker group. But I’m going to jump over to monitor. And you can see the throughput of our system. This is a demo system, so it’s really just feeding the one data, or one type of data. But we have examples of this at pretty high-scale as well. If I go to destinations. You’ll see all the destinations I have configured, and you’ll notice I have a [decimal?] configuration. That’s just kind of a - it’s kind of a catch-all for anything I’m not routing otherwise. But I have data going to InfluxDB 1. I have no data going in InfluxDB 2. And I have data going to my S3 bucket. So you can see the way things are working right now. Now, if I go over to my worker groups, let me show you the things that I said I configured ahead of time. But let’s start with the collector - or let’s start with the InfluxDB 2 instance.
Steve Litras: 00:16:23.851 So I’m going to destinations, and I’m just looking at this configuration. It’s a pretty straightforward configuration. You put in the right API - URL. You define the database name. We set our stuff up with a database called Cribl - or a bucket called Cribl. Everything else is pretty much default here with the exception of one advanced setting to set our token. So that’s a difference between Influx 1 and Influx 2. Instead of using the username password, I’m actually using a token here, and I’m sending it as http header. I also define a user agent here. And that’s really all that’s set up there, not much to that configuration. And then if I jump over to my collectors, this is the collector that will pull data from my S3 bucket. Again, you can see that I have a fairly simple configuration in here. One thing that’s probably useful for everybody to know is this path. So when you write data to S3, when we write it as part of our destination, we define a partitioning scheme. And that partitioning scheme generally has tokens in it that we want to read back out, so we can figure out what to do with that data. So this partitioning scheme, the way this data is laid out is I have a timestamp that starts with the year, the month, the day, and the hour, all separated by slashes. And then I have the source type. So by using this path definition, I can pull all those things back. Those first four things get factored into the time, the under-bar time field variable. And then source type is read back so I can filter based on source type.
Steve Litras: 00:18:11.079 If I jump down to fields, you’ll notice that I’m adding a field. So the collector allows me to add fields ad hoc. So I’m adding a field called Collected, and I’m setting it to true. Any time in our product that you see one of these fields with a blue color code around it, it means it’s the JavaScript expression. So true is very simple. It’s just true or false. I could put in all sorts of JavaScript. I could put in a full JavaScript expression and have it evaluate as the field. And then I’m routing the data into the routes. Now, I could optionally do this straight to a pipeline into an output. But for our purposes, I’m going to send it to the routes. And then there’s really nothing in the advanced setting. The things that I am always nervous about cutting and pasting are things like credentials and URLs. So that’s why these are pre-configured. Now, again, going back to our monitoring, we saw that there’s nothing going to the Influx DB version 2 instance. And if I go to my routes, you see I have three routes. I have my archival route, and it’s using a filter. We have a preset of global variables, one that will tell us if an event is an Apache access combined log or not. So I’m using that as the filter here. Because when we generate metrics out of this stuff, we don’t necessarily change the source type. So I don’t want to send the metrics to my archive. I want to send the raw logs because I’m going to be able to replay those raw logs. That’s a choice I made, and that’s why I structured the filter this way.
Steve Litras: 00:20:00.526 The beauty of having JavaScript as our filtering language, amongst other things, is that you can decide how you want to filter this data when you set it up. Our logs to metrics route really is also filtered on access combined. It runs things through - it runs the data through the logs to the metrics pipeline which, basically, does aggregation. I’m not going to show the pipeline in this case, but it will- It’s a good way to extract metrics from a log source, drastically reducing how much data you’re sending to the end system. And then our output right now is to Influx DB, and that’s the version 1. But I actually want to change that. I want to create a new destination, something we call an output router. An output router is useful because it gives you the ability to send the same data that would be normally sent to the destination, to multiple destinations, and do it with a certain amount of logic. So I’m going to name this metrics. And I’m going to set my first expression to true, which just means that every event that comes by will match. And I’m going to set my final to no. Now, if I were to change my router right now to point at this, it would act just like it does now, right, because it would - all the data would come through this output router, and it would get sent directly to Influx DB 1, the instance 1.
Steve Litras: 00:21:37.856 But I actually have a second instance that I want to send data to. I’m also going to make it true. And I’m going to leave that as final. Now, I have a situation where I’m going to send the same data to both outputs. However, I am going to make a little change here, because when we do the collection, when I go back to fill this data back in, I don’t want the data I’m calling from the past to go back through the original InfluxDB, the first instance. And if you remember when I showed you the collector, I added a field called Collected. So I actually want to be able to check and say that the data coming through, if it does not have the collected field in it, if it doesn’t have the collected field, then send it to InfluxDB 1. If it does have the collected field, then it’ll also go to DB 2. So this allows me to go back and do that collection - or that replay, which I’ll show you in a minute. Okay. So this is all set up. Now, again, because we’re a distributed system, we used it on the back end to manage our configurations. So any time I make a change, I need to go back and commit the change and deploy it. Now, you’ll notice I can’t hit the commit button yet. And that’s because I haven’t put in a commit message. Now, we kind of enforce a certain level of hygiene. We don’t check if it’s a good commit message because that’s totally subjective. But we do force you to put in a commit message. So I’ll just say, “Creating router,” and I hit commit. And then I do a deploy to the worker group, and that’s going to take a minute.
Steve Litras: 00:23:30.880 So I’ll go back to our monitoring, and I’ll jump to our destinations. And because this is going to take a minute or so, I’m going to drop this down to one minute because, otherwise, it’s averaged over time, and it’ll take a little while to see. In the course of the next minute, you’ll start to see a tick-up in InfluxDB 2. Oh, oops, except I forgot one thing, I forgot to change the route. Sorry about that. So now that I’ve created the output router, I need to set the route so that the data goes through it. So I just selected that new output. I hit save. I go through my commit. And I deploy. So now, as you can see here, this one pipeline is now sending data. Instead of just InfluxDB 1, it’s setting it to that output router that I just configured. And if I jump over to monitoring, and go to destinations and drop my time down, you’ll see that now I have the router. It shows that there is a router there, that it’s in the output stream. And within the next minute, you should start seeing a bump of data there. One thing that I love about our interface is the ability to see data on the wire as it’s coming through. So I can actually - and this probably won’t show me anything right now because it’s in the middle of reconfiguring. But if I do - okay, there we go. Now, if I go here - I think I’m rushing it.
Steve Litras: 00:25:19.250 Yeah. I can capture data on the wire. So now that it’s here, I should be able to go to either one of these and see data. It wouldn’t be a demo if the demo-gods didn’t strike out at me a little bit. But you can see here’s the data that we’re catching right off the wire. These are the metrics that we’re sending downstream. And I can look at both for it. I can look at DB1. I can look at DB2, InfluxDB1 or InfluxDB 2. And I should be able to see data coming through. And there we go. Now, because I’m now sending this data, let’s jump over to our Grafana instance. This is the original instance. This is pointing to InfluxDB version 1. And you see I have a consistent set of data throughout. And If I go to the one that’s pointing at 2, and I run that query again, now I see there’s - obviously, there was no data flowing up until just now. But now I start to see that data here. So I now have data flowing to both systems. But I want to go back and fill some of that data in. So I go back to my collector, and I select it. Now, I can look at - I can run this collector in a number of different ways. I can just take a look at what data do I have. It’ll run a - for some reason, the preview is not coming up.
Steve Litras: 00:27:00.230 But let’s go ahead and just do a full run. So there are three different modes to collectors. There’s preview, where I can actually do a capture data as it’s coming through. There is discovery where it will go out. In an S3 type of environment, it will go out and go through all of the objects in my S3 bucket looking for the ones it’s going to discover, and a full run, which also runs discovery but then goes and collects those files. I can set time ranges. So if you remember in that path definition, I set some tokens around time. So because of that, now I can do time-based recovery of data. So let’s say I just want to do a relative version, I can do this either absolute or relative. Let’s say I just want to go back and get the last hour’s worth of data. And it’s been probably - it’s been a little less than five minutes, but it’s close to five minutes since I actually got this configured. I’m stopping the data before it already - before the dual right started. And if I run this, I can look, and I see that I have - oh, oh. The demo-gods are biting me, of course. Let me check my configuration real quick. I promise I tested this like five or six times beforehand. It’s true [inaudible].
[silence]
Steve Litras: 00:28:53.769 Okay. I don’t know what was going on, but I’ll have to go back and look at logs. What you see in this is I’ve actually done discovery, so discovery is complete. It’s discovered a bunch of events. Now, it’s - but okay. It’s discovered a bunch of events, and it’s collected a number of events. What I don’t know is if it’s actually [inaudible] to me those events down. Ah, there we go. It has. So now, if I go back over to this, and I rerun the query, I now see that I have all my data again. And there’s a little break here. It’s not as obvious as I’ve seen it before because I’ve had a wider break there. But because I was really just - I loosely did that rough window. But you can be very, very precise. You can get an absolute windows. I can go back and look at exactly when did I do that cutover. And then I can set my window to exactly that amount of time, so it just pulls back all of my data. Now, this is only with the hour’s worth of data. Right. For a demo, I don’t want something that’s going to sit there and take 20 minutes or a half an hour. So pulling an hour’s worth of data was very quick. If I’m pulling a lot more data, this can run over time. The nice thing about the way we’re set up for the distributed environment is - I’m running this in Kubernetes. And I have a - my worker group is set up to autoscale. So because it’s pulling from S3 and that discovery figures out how many different files it’s going to need to collect off of, it’ll actually scale up to do that. That’s not true of every environment. That is true of our Helm chart deployed Kubernetes environment. And that is the gist of the demo. Let me jump back to share the slides.
[silence]
Steve Litras: 00:30:50.544 Caitlin, do you want to take it away?
Caitlin Croft: 00:30:53.184 Awesome. Thank you, Steve. So if you guys have any questions, I know there’s been a few that have come in, but feel free to post more. Once again, I just wanted to remind everyone that InfluxDays EMEA 2021 is coming up. So May is just around the corner. So the conference itself will be on May 18th and 19th. And it is completely free. And we also have the Hands-on Flux Training, which will be a week prior on the 10th and the 11th of May. And like I mentioned before, there is a fee attached to that one. But that’s just simply so that we’re making a really good student to instructor ratio. And we also have the free Telegraph training. Seats are limited for that. And it was super popular and sold out last year, so be sure you register for it soon. I hope to see you all there. It’ll be a really great event. If you were there last year, we did it with Zoom and Slack. And we’re definitely upping our game this year. We’re using a really fantastic online event platform, so it’ll make the experience even richer for our community. So looking forward to seeing you all there. All right. Steve, there’s lots of questions for you. And the first question is, once I am done with my upgrade, how hard is it to extract the product from my process?
Steve Litras: 00:32:25.343 Well, we recommend you don’t as a general rule. But if you’re still running like-to-like systems, right, it’s actually very easy to extract the product. You really would have to go back and configure your agents to point directly at the InfluxDB, and then that’s pretty much it.
Caitlin Croft: 00:32:45.988 Right. Is MQTT supported as an input and an output?
Steve Litras: 00:32:51.211 Yeah. I saw that one. I don’t remember if it is, specifically. We’re always adding ins and outs or sources and destinations to the product. But I will go back and look. And I do recommend - one thing that I didn’t talk about yet is that we also - just like Influx, we have a Slack community where most of our customers and partners are on. I mean, it’s a great place to get these kind of questions answered. If I don’t have the answer here, I will try and come back and provide it to Caitlin so she can provide it to you guys. But nonetheless, that’s always a great place to get questions answered about our product.
Caitlin Croft: 00:33:35.291 Absolutely. And then I will also shamelessly plug InfluxData, also, we have our online forums, and we also have our own Slack channel that’s filled with community members as well as employees. They are ready to answer questions, so.
Steve Litras: 00:33:50.974 And I’m also on that Slack channel as well, so.
Caitlin Croft: 00:33:55.148 Yeah. So if there’s any questions, you can definitely check out both of our Slack channels workspaces. I have numerous 16-terabyte, which is the largest size allowed on Amazon EBS volumes of archived data in InfluxDB 1.x format. When created, the database use the default retention policy, which is forever. How can I migrate the data in these volumes to InfluxDB 2.x?
Steve Litras: 00:34:25.691 So yeah, this is a little bit of a challenge. Right. If you have that data, you don’t have it in the raw format or whatnot. You’re going to have to do some sort of export or import on that if you’re doing them in parallel. We do recommend that you put the replay capability in there, and do it fairly early on. That’s one of our core use cases is the ability to send off your raw data to a cheaper local - or a cheaper storage model and be able to replay it at any given point in time. Influx would actually be - probably provide you the better answer for if I’m just trying to take that data specifically from InfluxDB 1 in the format that it’s in, and move it over. More than likely, I’d expect it’s an upgrade, a direct upgrade.
Caitlin Croft: 00:35:16.061 Right. Next one. I have a question about the performance of Cribl. I have about 200 gigs of data in InfluxDB 1.x, and I would like to migrate to InfluxDB 2. How long could it take using Cribl?
Steve Litras: 00:35:34.428 Well, if you’re doing it from the S3 bucket type approach, the correct answer is it depends, just like you always get from any consultant. Right. It totally depends on the amount of horsepower you put it through. We have a sizing guideline, about 400 gigs a day of throughput per recommended system type, which is - I can’t remember the exact spec on it. But if you look at that at 400 gigs a day on one instance is our kind of baseline, then you figure if you’re just doing one instance, half of that. It can easily handle half of that on one. But if you spread that out, as I mentioned, through an autoscale or through a scaling mechanism, you can definitely speed that up.
Caitlin Croft: 00:36:29.628 Perfect. Can you pull data from another source and enrich the events being processed with that data?
Steve Litras: 00:36:37.637 Ah, yes. So we have an enrichment capability where we can do - so we started out with our enrichment capability using CSV lookups. Right. So you’d feed a CSV file, and you’d use that to enrich data as you go. Recently, with our 2,4 version, we’ve implemented Redis which, in the past, we weren’t able to really do kind of a dynamic lookup where I want to, “Yeah. I have the CSV, but that continually gets updated, or it gets updated at a certain point of time every day,” that kind of thing. That was all manual. With Redis, we now have the ability - and we’re doing this in our production environment - where we’re doing pulls from things like threat feeds. We’re doing REST API pulls from threat feeds to feed into Redis that we’re then using to enrich data as it comes through. So we can do that. We can do that all day long, and we do it in a number of different places. But the approach with Redis allows us to have it all within the product. I can do a collector that reads whatever REST API, S3 bucket, whatever, and feeds my enrichment data into Redis. And then I have the pipeline that’s actually managing the data that then uses Redis to look that up and enrich the data with it. So absolutely, we can do that
Caitlin Croft: 00:38:03.759 Awesome, and I just wanted to let you know, Steve, someone definitely give you a shout out for the joys of doing live demos. [laughter] Yeah. But hey, it happens, and you just go with it, right?
Steve Litras: 00:38:19.204 Exactly. Exactly. You have to roll with the punches.
Caitlin Croft: 00:38:22.889 It makes live demos more interesting, right? [laughter]
Steve Litras: 00:38:25.413 Correct. Yeah. I don’t like that you’re talking to a video all the time, right?
Caitlin Croft: 00:38:30.667 Exactly. Yeah. Well, if anyone has any more questions for Steve, please feel free to put them in the Q&A or the chatbox. It was a really great session. It looks like a lot of people had questions. So you all should have my email address. So if you have any questions that you think of after the fact, please feel free to email me. And I will connect you with Steve, and he can help you out. This webinar has been recorded. It will be made available later today. The good thing is once it’s recorded and posted - it’ll be posted directly to the registration page. So if you just go back to the webinar registration page this evening, the recording will be there in addition to the [inaudible]. So it’s super easy to go and find the recording. Well, it doesn’t look like there’s any more questions. So thank you very much, Steve, for your fantastic presentation. And I hope to see the rest of you at future InfluxData events.
Steve Litras: 00:39:38.371 Absolutely. Thanks for all your time, everybody. Much appreciated.
[/et_pb_toggle]
Steve Litras
Director of Technical Marketing, Cribl
Steve Litras is the Director of Technical Marketing at Cribl, makers of LogStream which allows enterprises to process log data before they pay to analyze it, getting the right data, where they want, in the formats they need. LogStream is first-of-its-kind, purpose-built for logs, and helps customers reuse their existing investments in proprietary log pipelines to send data to their multitude of tools, while securing the contents of their data, and controlling costs.
Before Cribl, Steve ran both the global infrastructure team and the Enterprise Architecture team at Autodesk. Steve has been a passionate advocate of using log data to improve operations ever since he first laid hands on Splunk in 2006, and is thrilled to be helping Cribl on its mission to help customers unlock the value of all of their machine data.