Let's Compare: Benchmark Review of InfluxDB and OpenTSDB
In this webinar, Ivan Kudibal and team will compare the performance and features of InfluxDB and OpenTSDB for common time series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. Come hear about how Ivan conducted his tests to determine which time-series db would best fit your needs. We will reserve 15 minutes at the end of the talk for you to ask Ivan directly about his test processes and independent viewpoint.
Watch the Webinar
Watch the webinar “Let’s Compare: Benchmark Review of InfluxDB and OpenTSDB” by filling out the form and clicking on the download button on the right. This will open the recording.
[et_pb_toggle _builder_version="3.17.6" title="Transcript" title_font_size="26" border_width_all="0px" border_width_bottom="1px" module_class="transcript-toggle" closed_toggle_background_color="rgba(255,255,255,0)"]
Here is an unedited transcript of the webinar “Let’s Compare: Benchmark Review of InfluxDB and OpenTSDB.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers: - Chris Churilo: Director Product Marketing, InfluxData - Ivan Kudibal: Co-founder and Engineering Manager, Bonitoo - Vlasta Hajek: Software Developer, Bonitoo - Tomá Klapka: DevOps Engineer, Bonitoo
Chris Churilo 00:00:03.057 All right. As promised, we will get service three minutes after the hour. So welcome everybody. Thanks for joining us in our webinar. We’ll be going over the benchmarking, the process as well as the results of our benchmarking test of InfluxDB and OpenTSDB. Proud to introduce to you our friends and partners, Bonitoo and the guys there; Ivan, Vlasta, and Tomas will be taking you through this presentation. And if you have any questions, feel free to put them in the chat or the Q&A, whichever is convenient for you and we’ll make sure we get them answered. So without further ado, guys, I’ll let you take it away.
Ivan Kudibal 00:00:44.285 Thank you Chris. So hello everybody. My name is Ivan Kudibal. I run the company called Bonitoo. And today, together with Vlasta Hajek and Tomas Klapka, we are going to present benchmarking of InfluxDB compared to the benchmarking activities of OpenTSDB. We were asked by InfluxData to do these benchmarks unbiased and fair. And in the next 60 minutes, we are going to present not only the results, but also the process of benchmarking.
Ivan Kudibal 00:01:33.418 The mission was to perform the benchmarks for latest releases of InfluxDB and OpenTSDB. So basically what we did, we refreshed the benchmarking efforts and followed the tests that were conducted first in 2016 by Robert Winslow. For the benchmarking, we used the existing framework called InfluxDB-comparisons. This is the fourth webinar in the Let’s Compare series. So previously, we did the webinars for other databases. Today, this is about OpenTSDB. And well, it’s first time to say that the OpenTSDB and InfluxDB are both Time Series Databases, unlike MongoDB, Cassandra, or Elasticsearch. And both the databases are in general designed to store in varied time series-based measurements. So it’s the first time that we can tell that we are going to present the results of comparing apples to apples.
Ivan Kudibal 00:02:54.891 The webinar is going to have five parts. The first one is the introduction to the InfluxDB-comparison framework. Then we are going to demo InfluxDB-comparisons against the OpenTSDB deployment. We want to show you and present the installation setup and configurations options that we used for the OpenTSDB. And finally, we are going to show the results and conclusions. There will be time for Q&A at the end of the webinar. So thank you. Let’s dive into Part One, the introduction to the InfluxDB-comparison framework. And I will ask Vlasta to start.
Vlasta Hajek 00:03:56.061 Hi. Sorry for technical problems with the presentation. Kind of gets slowed. Here we are. So as Ivan told you, we are learning the benchmark of data around in 2016. Now with the latest version, so InfluxDB and OpenTSDB, the previous measurements was with InfluxDB 1.0 and OpenTSDB 2.3.0 for each [inaudible]. And now we have InfluxDB 1.4.2 and OpenTSDB 1.3.0 final. I have briefly described the methodology and the framework used to benchmark InfluxDB and OpenTSDB. The methodology and framework were originally designed and developed by Robert Winslow and we have used that without any major modifications. Robert already detail-explained the framework and the distinct approach in his webinar a year ago. And if you like to hear more details, I advise you to watch those webinars. You can easily find them on InfluxData website.
Vlasta Hajek 00:05:44.119 So, what is the methodology? We can spend the hours with discussions about the best benchmarking methodology, and of course, there will be common properties that each approach must hold. Especially, it must be realistic, the same with area wide usage. It must be fair and unbiased to all databases and it must be also reproducible. In our case, we’ve selected variable use case and this case is where DevOps engineer who maintains fleets of tens or hundreds, maybe even thousands of machines, is switching on a dashboard, showing metrics gathered from those servers. Our metrics such as CPU usage, memory, disk space, Disk IO, or network, Nginx, PostgreSQL, Redis. Those metrics are collected by agents running on the servers’ website by Telegraf, and they are sent to a central DB. Those metrics would be gathered each 10 seconds from hundreds of servers. We could see a most continuous stream of data going into a database.
Vlasta Hajek 00:07:31.724 So we have from six to 12 measured values in measurements. It means 11.2 values on average. And us, there is nine total measurements. We are getting almost the 100 total values written in database from measurements in each host. Each measurement is also 10 tags or mostly system info, and the data sent from metadata place, such as name, region, data center, OS version, and so on. The metrics are generated using server engine generator with employing [inaudible] work algorithm to achieve a possible variance of data. And our measurements, they used datasets with 24 hours duration with very simulated data gathered from 100 hosts.
Vlasta Hajek 00:09:06.355 So, what do you measure? For the time series use case and the four other types of data storage, definitely important to measure ingestion rate. Just input the time between how quickly the database ingests loads to the database. What [inaudible] how much data can database handle at the time? We measure this in values per second and we consider the best value, which is the highest. It’s also important to know how the data are transferred to disks and what the final disk goes. How efficiently the database engine uses that provided the space? Here, the lesser the better. And of course, we would like to perform queries and read the data as fast as possible to have follow this dashboard to metrics and then this [inaudible], which you measure in time intervals during which is query process and we calculate the possible number of queries per second. And the higher, the better. As our case, storing the time series data where there are no updates or deletes. So we did other measurements.
Vlasta Hajek 00:11:02.922 So about the benchmarking framework. It’s written in Go language and it consists of a set of rules, specific for each database and specific for each phase of the benchmarking process. At first place, we generate the input data in a wire format specific for each database. For InfluxDB, the data are plain text, the OpenTSDB uses JSON. When you use the native format, it reduces the further overhead for the load. Then we use the regenerative data and transfer it to the network, to the database. To achieve the best performance, use the fasthttp library, for HTTP communication, which is used by both databases. And we do it in chunks, so we use bulk APIs for both databases. And similarly for prints.
Vlasta Hajek 00:12:30.704 And for mass specific for HTTP, we generate various queries. Queries are generated from the template and relying in-and on time window and the host condition. The data to feed the query benchmark-the data are fed through query benchmarking tool, which sends the queries request to database, and measured the response time, and calculated the max and min values, which are then entered in the final statement. For achieving the best query performance, the benchmarking tool doesn’t validate the results. However, it’s possible to use the special direct options, which brings reformatted responses, so you can compare the responses from both databases and see if they match each value.
Vlasta Hajek 00:13:56.362 So let’s look at the examples of data format. Here we see OpenTSDB, also what it said, it uses JSON format. OpenTSDB, it self-defines a fixed schema where we have metric, timestamp, tags, and finally value. It means that we can insert only one value at a time. OpenTSDB timestamp uses milliseconds partition and tags must be at least one. And in default configuration, OpenTSDB specifies that the maximum number of tags is eight. So for our benchmarking, we had to set to switch it off this limit.
Vlasta Hajek 00:15:06.850 InfluxDB uses so-called line protocol format, which is also nice human-readable and also easy writable. So it consists of four parts: name of the measurement, set of tag of key-value pairs, set of field, also key-value pairs, and the timestamp, which is in for InfluxDB, by default, nine seconds precisions. This whole line completes so-called point and such inserted points as you can see are also writing several values at a time. So it’s going to be understood that InfluxDB has fixed schema; measurement name, tags, and fields. But it’s flexible as regards the number of tags and fields. So in any point and other than new tag or field, and maybe tags can be a total escape. So it requires just one field. Here you see point examples from my measurement at work and PostgreSQL.
Vlasta Hajek 00:16:44.460 For measurement, query performance, use query. That would be typical example of source of data that someone would bring for us to use. So in our case, the query size, the maximum CPU usage for a given host over the course of a random hour interval grouped by one-minute interval. So here, we have example of OpenTSDB query. OpenTSDB offers flexible query language. But on the other hand, it’s quite complex. It’s also JSON, so human understandable, but not so easy to write. So be very careful about quotes, about parentheses, and so on. So in the beginning, you have the time bucket, we use milliseconds for the start and the end to define our one-hour time window with aggregation for one minute. Then if you have some tag filters to-where we are looking for host name. In our example, host_0.
Vlasta Hajek 00:18:29.118 And on page two, we have finally metrics to show and the output. One remark here for querying OpenTSDB, Robert Winslow discussed in this previous OpenTSDB benchmarks webinar, which were around again the OpenTSDB 1.3.0 risk candidate two, that query results from OpenTSDB were not always accurate, some fuzzy. And there was a note that it should be fixed in the final 2.0 API. It means original for endpoint, not original for API, not original OpenTSDB. And we are glad that in [inaudible] sets. So OpenTSDB 1.3.0 find no reference great results, if we are comparing at least to InfluxDB results, either both are correct or both are the same forte.
Vlasta Hajek 00:19:53.286 So in this example of InfluxDB query, InfluxDB uses InfluxQL clear language, which as you can see is very similar to SQL. It is writable and readable. So we have the SELECT bar, we have the query bar, and the aggregation. So as you could see, InfluxDB’s queries are quite a bit shorter, comparing to OpenTSDB series. And we are now-the latest webinar from this Let’s Compare series, we can say that from the older databases, InfluxDB is the shortest and concise format. So as you hopefully know, at least a little about the methodology and the framework, so let’s see how easy it is for usage. So Tomas, they will now show demo of benchmarking tools for OpenTSDB.
Tomas Klapka 00:21:12.112 Okay. Thank you Vlasta. I’d like to be sharing my terminal window.
[silence]
Tomas Klapka 00:21:33.266 Okay. Hello everybody. In the following couple of minutes, I’m going to show you a simple practical demonstration of benchmarks comparison tools. And today, I’m going to benchmark the OpenTSDB, which we run setting [inaudible] on our single EC2 instance. And it consists of one DSDM on HBase and Hadoop file system underneath. And firstly, if you don’t have the Golang binary already installed on your machine, feel free to download it from the official website. There are also useful guides how to install and get it working on most operating systems. After setting up your Go installation, you are ready for getting the command-line tools from remote GIT repository. I’m going to use the Go get command, which is documented as the standard way of downloading and installing all necessary packages and their dependencies. In this example, you will need four tools from the repository, and after the Go get command is done with all the magic, we will be ready to run.
Tomas Klapka 00:23:00.974 So let’s have the bulk_data_gen. Done. And for loading our dataset to the database, we will use the bulk_load_opentsdb tool, as you can see on my terminal. And similarly to the first bulk_data_gen example, we need something for generating queries. Go to bulk_query_gen and finally, declare benchmark tool. Okay. Now let’s have a look at the dataset generation and ingestion commands. I’m going to use a smallest set of parameters as possible and others remains in the default, so I do not use it all. So the bulk data gen helps generate the dataset according to input parameters, like use case parameter and the time window. And then it’s used as an input stream for the next command called bulk_load_opentsdb. And in this demo case, I’m using two-hours datasets, as I’ve told you before and to make it faster actually. Sorry.
Tomas Klapka 00:25:43.162 Okay. Now here we got the ingestion rate with-like we can see here by second. And after we are done with the data loading, we want to find out the query times. Now I’m going to show you the set of commands for query generating and query execution, actual executions. So here, we have the bulk generator, which has the following options or parameters, one host per one hour. And in this case, the format will be OpenTSDB and also, we’ll stay with the two-hour time series window. Our use case is DevOps as before and we will need 100 queries for our benchmark purposes.
Tomas Klapka 00:27:24.654 Okay. It returned to maximum value for CPU metric, and on random one host and random one hour, which is grouped by one minute. And as I told you before, we used 100 points or queries for-to benchmark our OpenTSDB instance and our dataset. And with regards to actual results, if we look at the meantime, we can see here almost six milliseconds for one query and 167 queries per second. I think that’s pretty much all, and I hope that this showed you how easy could it be if you want to start with those tools from scratch. But anyway, when it comes to complex usage, there might be quite high number of parameters, but however, they can help you to successfully accomplish your use case, so they are not needed for my demo purposes. But in case you want to benchmark something more complex, you would need to adjust your parameters to be happy with your results. So thank you for watching and don’t forget to try it. All those tools are publicly available in InfluxData benchmark comparison GIT repository and under MIT license. And to Vlasta, you can go on, please.
Vlasta Hajek 00:29:52.904 Okay. Thank you, Tomas. So now let’s see how we actually set up databases and around benchmarking. So, what hardware did you use? In our benchmarking, we run benchmarks on two types of hosts. In one case, cloud-based to the two hosts. And in the second case, the infamous virtual machine. We wanted also to validate whether someone should be worried about performance of cloud based that your machine comparing to virtual machines, performing the, nowadays, trends to virtual machines and cloud. We had the HP [inaudible] with the Intel Xeon E5 2640 running at 2.60 GHz, 2x8 cores and [inaudible]. So in those machines, there was a little drawback that those machines had a [inaudible]. In AWS, we choose the c4.4xlarge machines. Also, it has a similar or other parameters, but it has [inaudible] CPU and EBS [inaudible] SSD drives. So you can say the results are quite comparable or what was is measured on AWS and [inaudible]. But in AWS, the specification was a little faster.
Vlasta Hajek 00:32:08.998 Both databases have used the single host deployment. In case of InfluxDB, we use the default configuration without any tweaking. InfluxDB is quite complex to setup. We chose the single node cluster based on HBase 1.2.6 Pseudo-Distributed Local Install, storing its data in the single node cluster Hadoop for version 2.7.4.
Vlasta Hajek 00:32:56.506 So OpenTSDB is built on top of HBase, this comes with a lot of complexity and OpenTSDB has its complexity. So overall, OpenTSDB configuration is not easy. By trying, discovering and searching, we finally managed to do our benchmarking, and then we had to do a lot of changes in configuration. As I mentioned here, OpenTSDB has different limit on number of tags for our value. And this is set through 8, and we had to change the number of tags, so we had at least 10 tags. For bulk insertions, we had to enable “chunks” for both requests. And to improve performance, we had to increase the tsd.http.request.max_chunk parameter. We had to enable the tsd.core.auto_create_metrics so that metrics could be created automatically. We also had to enable tsd.core.uid.random_metrics to relieve metadata write coordination issue because we saw OpenTSDB errors about conflicting UIDs for the same metric. We had to disable OpenTSDB compaction to make write performance predictable and to prevent TSD servers from timing out. And we enabled LZO compression to save the disk space.
Vlasta Hajek 00:35:21.809 When we were setting up the OpenTSDB for pseudo-distributed single node installation, we had to read the various docs and it was not enough. By [inaudible] trials, I think everyone is searching for a solution, we had to configure options with [inaudible]. But we had to edit multiple configuration files and also in multiple formats in XML and plain text were. As you probably know, XML is not so user-friendly for everything, but still manageable. So after we waited so long, finally it’s time to show results and Ivan will share our results and conclusions.
Ivan Kudibal 00:36:29.016 Thank you Vlasta. You can keep on the presentation sharing and I will just continue. Here are the results and the numbers at ingestion rate for InfluxDB and OpenTSDB. This is the case with 100 servers, then a second interval, 24-hour dataset and four workers in parallel working against the databases. The AWS c4.4xlarge images, the ingestion rate was 1.4 million inserts per second. The OpenTSDB had this number 10 times smaller with the number reaching to 150,000 inserts per second. As for the query rate, you can see, we made InfluxDB work at a rate of 820 queries per second, while the OpenTSDB was able to serve 111 queries per second at its best case. Well, also, as for the size on disk that was used, the total size of disk used after the data was inserted was only 145 megabytes for InfluxDB, while the OpenTSDB needed to allocate over one Gigabyte. So we can say, InfluxDB is 10 times faster. I think just straight, simply, it has 7 times faster query write and it shows us it has 8 times more efficiency in disk space usage.
Ivan Kudibal 00:38:58.016 And if we go to the next slide, there are the conclusions. Based on our experience from the testing and our experience from software configuration and setup, I can clearly tell you that InfluxDB better fits the use case of monitoring fleet of VMs. Even if we talk about a single node testing, the InfluxDB will serve better because it has an excellent performance. For example, to achieve the same performance with OpenTSDB, you would have to scale horizontally the OpenTSDB and the cost of time would be of the side effect as well as the cost of hardware. OpenTSDB also falls in comparing of the configuration, the difficulty. And simply, HBase is one of the weakest points of OpenTSDB tuning. You need an expert that will have time and be able to tune up the HBase in order to lift up the performance of the OpenTSDB. In the future, any other installations maybe, in order to upgrade the OpenTSDB or to migrate the data may cause headaches because the people typically will face making progress by errors only. The documentation of OpenTSDB is not straightforward. The only way to make progress is to see the errors and [inaudible] out the error messages.
Ivan Kudibal 00:41:11.847 Well, again, the results and numbers will be republished in the technical paper and the blog later after this webinar. You also are welcome to try InfluxDB-comparisons framework yourselves. Tomas gave you a pretty good procedure how to run. We are also eager to hear from you about your experience with InfluxDB-comparisons or experience with the results. We are able and wait for any kind of feedback. As for the technical issues, you can contact us directly at [email protected] and we’ll be happy to answer.
Chris Churilo 00:42:24.727 Awesome. If anybody has any questions, feel free to put it in the chat or the Q&A. And we’ll stay on the line just for the next couple of minutes and wait for your questions.
[/et_pb_toggle]