How to Improve Performance Testing Using InfluxDB and Apache JMeter
Session date: Jul 30, 2020 08:00am (Pacific Time)
Apache JMeter is a useful way to run performance tests across different servers. In order to monitor these results, SAP chose to integrate JMeter with InfluxDB, their time series database, to collect and store the temporary transactions. They use Grafana to visualize real-time performance metrics. What happens if your database goes down - for any reason? It could be because of too many JMeter threads trying to access the database or because Grafana is trying to access too many cores of transactions during a performance test. Discover how SAP improves their performance monitoring team’s productivity.
In this webinar, Subhodeep Ganguly will cover:
- SAP's approach to recovering transactions due to database failure
- How JMeter execution threads will store the data in a temporary flat/CSV file compatible with InfluxDB
- Their ability to reduce recovery times and to improve automatic performance testing
- Usage of influx-replay tool as a plugin or compact jar file during the execution of an end-to-end performance test
Watch the Webinar
Watch the webinar “How to Improve Performance Testing Using InfluxDB and Apache JMeter” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “How to Improve Performance Testing Using InfluxDB and Apache JMeter”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Caitlin Croft: Customer Marketing Manager, InfluxData
- Subhodeep Ganguly: Senior Developer - Performance Engineering, SAP
Caitlin Croft: 00:00:04.438 Hello everyone again. Welcome to today’s webinar. Super excited to have Subhodeep from SAP discussing how to improve performance testing with InfluxDB and JMeter. All right. Off to you, Subhodeep. Oh, I think you’re muted.
[silence] Caitlin Croft: 00:00:44.807 Hello. You might have muted on the headset.
[silence]
Caitlin Croft: 00:01:02.466 Apologies, everyone. The joys of technology. We still can’t hear you unfortunately. It’s amazing how you can test this all out and then sometimes there’s still a couple of kinks. Well, while we wait for Subhodeep to get that sorted, we will - yes, thank you, Ed. We’re figuring this out. So we’ll get there. Subhodeep, we can see your slides, but we can’t hear you unfortunately.
[silence]
Caitlin Croft: 00:02:16.717 Subhodeep, sometimes there’s a setting or a toggle on the headset that can mute the actual headset. So I’m wondering if maybe it’s that. All right. Looks like he - apologies, everyone. We’re just having a couple of technical difficulties here, but we will get started.
Subhodeep Ganguly: 00:02:51.396 Yeah. Am I audible now?
Caitlin Croft: 00:02:52.952 Yes, we can hear you. I’m so excited.
Subhodeep Ganguly: 00:02:57.353 Cool.
Caitlin Croft: 00:02:59.545 And then, it looks like you need to re-share your slides.
Subhodeep Ganguly: 00:03:03.915 Correct, correct. Yes, I’ll do that. Sure.
Caitlin Croft: 00:03:08.530 Perfect.
Subhodeep Ganguly: 00:03:15.811 Okay. So hi, all. So a warm evening to all, whoever’s just joined this webinar. So thanks for joining this webinar. So today I’ll be explaining and also giving a small demo on the end to end performance framework and also the replay tool which we have developed and also filed a patent from SAP a few months back for the replay framework which is working as a product for replaying any kind of transactions from the performance or automation framework. And then it is used as a compact JAR file which can be imported for any kind of databases and any kind of data streams, which can use this particular framework for replaying the transaction during and after execution of the test. So I will be slowly explaining the working model and also what are the monitoring tools we’ll be using. So in SAP, we are using all this open source framework models, all these open source tools, which we actually develop a framework using this open source tools and then we deliver a large-scale monitoring system. So that is the main agenda for today’s webinar.
Subhodeep Ganguly: 00:04:46.833 So this is just what I am going to cover today. So this is about me, about my experience, and also what exactly we do at the performance engineering group, and what kind of architectural model we use, and what kind of monitoring tool we use, and also what kind of test result validation, kind of analysis model we use, which actually helping all the performance test engineers to find a bottleneck in the application. That is the main objective. So using those result analysis and using those metrics, all the performance test engineers and all the QA teams are able to validate the correctness of the application and then they are able to find if there are any bottlenecks in the application. So these kinds of result validation model we use. Then comes like what kind of replay framework do we use, and how does it benefit all the stakeholders and how it scale all the application level monitoring and the end to end architecture. So that is the very high-level brief of today’s agenda.
Subhodeep Ganguly: 00:06:06.286 So in the next slide, so yeah. So this is a very brief about me. Currently, I’m working with SAP. I joined around one and a half years back. And I mainly work on all the performance test models. I worked on end to end automation testing projects and which involves Web, Angular, and mobile based automations. Also, I developed an end to end test framework using UFT Selenium, test complete LoadRunner when I worked in my previous organizations earlier. So also, I worked JMeter and SAP which is a replacement of the LoadRunner. So I worked on programming languages like Java, Python, and also have a good experience on VB scripting, Ansible and others. So that is very brief about my experience.
Subhodeep Ganguly: 00:07:12.809 So in this slide, I’ll be explaining what actually took us for us to use the InfluxDB and Grafana and later on the ELK - that is Elasticsearch - Logstash Kibana, what allowed us, what is the requirement, what is the use case, which actually let us take these InfluxDB and Grafana as a time series database and also as an metrics model. So I will be just explaining in a brief manner on this particular aspect. So as we know that InfluxDB is a time series and analytics database, and it is an open source and with very minimal effort we can use the whole end to end DB infrastructure. So currently, we are using InfluxDB 2.0. So we have used 1.7, 1.9, and currently, we are using 2.0 version. So we are using Telegraf as a server monitoring agent for all the application servers, and we are using a scripting language, Flux, for fetching all the necessary queries from the InfluxDB and also developing one kind of a graph model to see if there are any issues anywhere in the infrastructure and also if there are any memory or CPU leakage. All these things we used to query through Influx language. We also use Chronograf and Kapacitor. So Kapacitor, we do not use much, but for developing an alert monitoring and [inaudible] system, we have done a POC on Kapacitor. But later on, we have developed our in-house alert monitoring model, an alert [inaudible] model using Perl and Ansible. So earlier, we have tried with this, but this Kapacitor, we have not used much, but remaining, the Telegraf and InfluxDB and Grafana, we are using very much excessively in our entire project. And here comes, like as the InfluxDB is a time series database, so that is the one of the major requirements in case of any issues in the database or anywhere in the infrastructure. So that was the problems [inaudible] which led us to create a replay framework which can resolve any kind of issues in the infrastructure later in case of the failures, which I’ll be explaining in my upcoming slides.
Subhodeep Ganguly: 00:09:55.319 Yeah. So in this slide, as I mentioned, we are using InfluxDB, Grafana, Telegraf, the TIG model plus we’re using an efficient ELK monitoring system. So this is also an open source model. As I mentioned, we are completely using open source model with very, very minimal licensing cost and the Elasticsearch stack, we are using Elasticsearch, Logstash, Filebeat and Kibana. So with the help of this open source monitoring model, we have created a framework which can stream the data from around 150 to 200 servers, and then it can analyze on the tip of your - on the tip of [inaudible] and it can analyze the full large processing data system whenever there is any issues in the infrastructure with the help of some queries, some filtering, some kind of retrieval mechanism, we are fetching the details from all this log processing service and in case of any issues anywhere in the servers, we are able to analyze through a monitoring UI that is Kibana. So we have built Logstash filtering for processing those logs from 200 servers based on a specific requirement, based on the specific logic, and the Logstash filtering will decide what kind of data will be processed and what kind of data it will be visualized in the monitoring UI. Based on that, the test teams will be able to find any issues in the application servers or in the web servers or in the DB servers. So this kind of ELK monitoring model we have used, and we’ve integrated to the end to end infrastructure that is developed using Java.
Subhodeep Ganguly: 00:11:54.042 Yeah. So this is the slide where I’ll be just trying to explain how it in large scale model we have adopted and also how fast this infrastructure is that it can analyze and search any kind of processing data, any kinds of exceptions, any kind of issues, or any kind of breakage in the system. It can visualize in that manner. And with the filtering model, you can filter relevant data, relevant results, relevant pictures in the dashboard, which will allow you to drill down more on the issues if it is coming from the processing server. So that’s why these three point, that’s scale, speed, and relevance is comes into picture because the infrastructure model, currently, we are having 200 plus servers. So in case of we are getting multiple requests from multiple teams to adapt to this particular technology and infrastructure, so in case of server load is going high, so we are fine tuning the infrastructure so that it can consume more data, more pipeline systems, and it can process the data, and it can analyze, and it can show you the result in this milliseconds gap. So this kind of infrastructure we have developed using Elasticsearch and Logstash and Kibana. So this is a high-level in infrastructure model, how the entire infrastructure works. This is from the execution that invocation of the test case and then invoking the execution and then connecting with different application servers, capturing the data and capturing the logs and saving to the database and then processing the logs to the respective servers, processing and analyzing the results and then create a matrix model.
Subhodeep Ganguly: 00:14:09.018 So if we can see here, I’ll just explain in a brief manner. So whenever you upload your test case in your repository - it can be SVN, it can be Git, it can be Bitbucket, anything - so you can upload your test cases or your JMX test plan in the repository, and then, the test cases will be available in the JMX and using on Jenkins job, you can schedule your test and then that performance framework model will be invoked by the test case and by the triggering model. Before that, it does N number of pre-processing tasks. And once your pre-processing tasks are over, if all the infrastructure is ready for invoking your JMeter - that is a performance test - so it will invoke your test, and we are having logic which are built to increase the number of virtual users on the fly. And also, you can schedule how many virtual users you want for your test case in a ramp-up manner. And based on that, it can start populating the virtual users in the ramp-up time. It’s the same thing like we use in HP LoadRunner using controller and VUsers. So the controller does the same thing, but as everyone knows that LoadRunner is a very costlier software. So we have implemented the same kind of approach using this framework. This framework can be integrated with JMeter. So we have not tested with other performance testing software, but in case we have got one request to integrate this particular framework with some other testing tools, which we’re still working, but we are trying to make this performance framework so much mobile and so much adaptable to any kind of testing tools so that any team which are using any kind of other performance testing tools like JMeter, it can integrate that testing tool with our framework, and the framework should adapt these types of testing tool with very minimal changes.
Subhodeep Ganguly: 00:16:31.323 So with that particular model, also, it will invoke your test case using JMeter, and it will populate the multiple number of virtual users. And then, as you see here, there are multiple servers are there which are nothing but application under test. So in all the application under test, there are multiple monitoring agents are installed which actually floods the data in real time. It processes the data. It sends the details to a server that is Elasticsearch, and then Elasticsearch start processing the data and it sends the - and our framework sends the aggregate value to a database, which is visualized to our own in-house Angular application. So that is the way for analyzing the results. So all these bits and pieces like the application under test and the other blocks where the JMeter is executing, and the CrateDB and the InfluxDB and the other bits and pieces, the [inaudible] are integrated to each other. So when a test is executed, everything is monitored and everything starts populating the data for the QA and testing teams. And lastly, when your test execution is over, it creates results for you and these results will be visible in different monitoring applications. So that is the high- level architecture of our framework.
Subhodeep Ganguly: 00:18:06.936 So apart from that, we have built many kind of [inaudible] defensive and monitoring tools and in-house tools, which is enhancing the framework and it can take care of any kind of failures in the infrastructure. So as I mentioned in the slide, it is a disk cleanup tool and disk space management tool. So we are maintaining large-scale servers and large-scale infrastructure. So when a test is get executed, it saves some data close to 4GB - it can go up to 4GB - the full transaction logs for one particular test case. And we have to manage this particular infrastructure with our already configured system. So test management is a very major part on this particular system because at a time, 10 to 12 teams and 10 to 12 key persons can execute the test cases. It can be more, not less. So it can go up to 10 to 15 also. So at that same time everybody’s executing their test cases, and it can flood your servers with huge number of data. So our disk space management tool always keeps a track how much data is being saved, how much available space has been consumed, and as per the logic, it triggers an alert if it goes beyond some threshold capacity, and then, it triggers a rule to free up the space based on a particular time duration so that when our test is executed, suppose like 10 or 15 days or 20 days more, those test results might not be again accessed by the testings, but it is still lying on your servers which only actually reducing your available space. So this alerting mechanism will notify, plus it will resolve these issues. And again, it will alert you once these issues are resolved in your servers. Plus, we have implemented a Slack notification tool for enabling efficient triggering of alerts, not only email alerts, but it will be informing all the stakeholders who are monitoring this infrastructure through a Slack tool.
Subhodeep Ganguly: 00:20:37.008 Next one is the replay tool, which is a major part of this framework, which actually eliminates for any data loss and will make the logs more accurate for the test teams. Next one is the resilient and disruptive testing. So we have done a POC on this particular thing. So it kind of - resilient testing, it’s beyond your stress and stability testing. It’s kind of noise testing. You can see that in case your applications is stress and stability verified, but still, there are some X factors which can break your system in production, if not tested. So this is a concept which we have implemented from Netflix. It’s kind of a Chaos Monkey model, which we did a POC, but it has not implemented for all the projects till now, because we are still exploring how it can benefit all the stakeholders. So as part of the POC we have seen that how it can be implemented. So this kind of model also we are keeping it ready and we are planning to deploy based on the requirements are finalized. And also, the health check and overall infrastructure health check is running without any manual intervention 24/7. Also, we have developed a resolver tool to fix those alert. It is actually based on a model which can analyze if there are any issues, plus it can resolve. So ideally, we actually analyze and then we get alerts. Then, what we used to do that we do a manual work to resolve those alerts, but we have find some improvised way which can resolve those alerts as well in case of it is triggered. So this kind of intelligence mechanism we have built that in case of any alerts like any application is down or any process is broken or any infra monitoring agent is not working fine or any kind of other issues in the system, it can find and then it can trigger a logic to resolve those alerts, so that it should not come again because the test teams are running their test cases 24/7. So in case of any issues in the infrastructure, their test execution will be affected. So this automated resolver tool will help all the teams to execute their testing seamlessly and efficiently.
Subhodeep Ganguly: 00:23:06.600 Plus, we are having a Dynatrace monitoring enabled for our performance engineering stack. So this particular tool, what it does, actually, it deploys this Dynatrace in all 200 servers at one click. Earlier, when this Dynatrace needs to be installed, if someone has worked on Dynatrace, what exactly needs to be done? You have to deploy these Dynatrace based on the machine configuration. You have to go each and every machine and then you have to execute, but we have come up with an intelligent solution which can identify your server requirements, it can identify what kind of Dynatrace agent you want to install, and then, he only have to give the server names and then it will install, it will start and start streaming of the data real time in the Dynatrace. This kind of model we have followed for maintaining the infrastructure.
Subhodeep Ganguly: 00:24:09.214 So this is a slide. This is actually tells about the overall architecture, what kind of model we actually use. We are having load generators. We are having run-time statistics generation. This is the CrateDB which actually stores aggregate results. And then, the Telegraf agent pushes the data to-and-fro, from all the application server agents. It actually creates a metrics related to systems and any kind of events for this particular system. Events like, if any kind of exception is triggered by the system or any kind of monitoring, memory usage is going high, these kind of events, it can trigger to a respective database. And at the time of generating the aggregate results, our core framework picks the data from this events database, and it creates an analytical result, comparison result. So this is the overall architecture. It’s very specific to the project, and we are having multiple load generators. We are having sufficient load generators which picks up your test dynamically, and also, we are maintaining CrateDB ELKs cluster and TIG Cluster and other necessary tools. So this is a brief about the performance framework. So it supports real-time performance testing using multiple load generators. And also, during our testing, you will be able to find your real-time performance test result visualization, and it actually picks up your aggregate results, and it triggers a rule to certify your test like suppose you’re executing a performance test. And a performance test is consisting of around N number of scenarios, suppose 200 scenarios. So out of 200 scenarios, how many test case have passed and how many test case has failed. Based on that, it triggers a rule which will certify whether your test is really passed or not. So that it does on run-time basis whenever your test execution is getting over. Based on the whole transaction list, it triggers a rule, and it shows the status of that particular test case in the monitoring UI. So that is a brief about that particular framework model.
Subhodeep Ganguly: 00:26:54.360 So again, here, this is a simple working model of the framework. So you can see that this is a framework. We are using a Java based framework. Once the test is executed, it triggers the application - it triggers the test to perform performance test on the AUT, that is the application under test, and then it always connect to all the application agents and it saves the data for InfluxDB. So here, you can see we are having InfluxDB and CrateDB also. So these two DBs actually performs different type of work. So InfluxDB, actually, it captures the raw data from a performance test. And once the raw data is stored, then this performance framework, it actually creates aggregate values from this raw data. And it’s shows the data in a human readable format in a visualization app that is developed here in our - project is developed in Angular. It can be built in any language. But the main reason is that once the raw data is stored, we are capturing the data and we are processing the result and we are creating the aggregates, and the custom reports will be visible in the visualization app and also Kibana and Grafana will work as a monitoring tool for real-time data processing. So that is a working model.
Subhodeep Ganguly: 00:28:31.731 So in this slide, you will be able to see what kind of framework model and what kind of benefits and scalability it is providing to us. So the replay tool is a self-defensive logic. It supports real-time performance test log processing, and it eliminates any data transaction loss. So Angular Visualization Application which we are using - this actually create a test summary report, and it validates different types of response time query statistics and exceptions. And the TIG and ELK stack tools, this is a brief about these particular stacks. It collects the performance result, and it actually gives an high availability model for any kind of - in case of any servers are going down, so it follows a high availability model, though the high availability model, it’s only available with InfluxDB Enterprise version, but we are trying to create a high availability model with our open source stack. So still, it’s not fully ready, but we are trying to develop a high availability model for the TIG stack. Similarly for ELK stack, high availability model is already built for ELK stack. We are maintaining around five Elasticsearch servers. In case of any server is going down, so the load balancing comes into picture and if any Elasticsearch is going down, the data is being pushed to the available Elasticsearch servers. And at the end of your execution, all these five Elasticsearch servers will be holding the equivalent data so that the framework can pick the raw data from any of the servers and prepare the aggregate result. So these types of high availability model we have already implemented for Elasticsearch servers and ELK stack.
Subhodeep Ganguly: 00:30:42.729 And this is a Dynatrace Enablement. And one more thing I just wanted to mention that this particular framework is so much adaptable to any kind of cloud infrastructure. So currently, we are [inaudible] in AWS and GCP, and for any project using an open source model, or using a performance testing, any team can adopt this particular performance testing framework to any cloud infrastructure. And with very minimal changes, this particular infrastructure can be used or can be rebuilt in any of the other cloud stacks with very little changes. And at the same time, it will support load stability, volume, and API testing. So you can run your tests for holiday volume testing for three days and data will be constant and using the replay framework, at any point of time during these 3 hours, in case you have any issues in the infrastructure, the replay framework will take care about any kind of failures, and it will make sure that you get the accurate result at the end of your execution. And as I mentioned earlier, in the beginning of my presentation that it is fully based on open source model. The licensing cost is very, very minimal.
Subhodeep Ganguly: 00:32:07.313 So in this slide, I have tried to explain few of the dashboard we usually visualize. So it kind of - if someone has used LoadRunner, so LoadRunner having some console UI, some server monitoring tools, which actually has to be implemented using the LoadRunner UI. And also, you have to deploy those tools in respective servers. So here, as you’re using the JMeter and the other open source tool, so it is a dashboard from Grafana, which is based on InfluxDB. So you can see that how many active users currently is executing your test case, how many users are started, ended, and pass and failed percentage, and also, it gives an average throughput. So all these graphs are used - usually, we used to stream. So at the start of your execution, this dashboard will be start populating the data and throughout our execution, this dashboard will be showing the real-time data. And in case of any issues anywhere in the infrastructure, so you will be able to - you’ll be able to find any kind of bottlenecks if exists in the infrastructure. So the real UI looks something different, but as it is a sample UI, so I wanted to highlight the main concepts.
Subhodeep Ganguly: 00:33:41.542 So here, you can see that the request and response per second, you can visualize here. So you can select your application. You can select your test suite, and you can select any of the graphs which you want to visualize for 1 minute difference, 1 second difference, 30 difference or 5 minute difference. So once you select all these things, your UI will show what kind of scenarios you want to visualize in the application. So you can see here. So this is kind of scenario based dashboard and graph it is showing. So this can be configured and this can be developed based on your requirement. So we are getting, from different testing, different types of requirements. So based on that, we are actually populating and developing these graphs and currently, it is stable. So every team is using the same dashboard. In case of any requirement comes, so we can fine tune the dashboards
Subhodeep Ganguly: 00:34:59.316 So this is the high level architecture. So here, I wanted to explain how this particular execution works. So you can see here, this is the application servers and this is a JMeter slaves. So this JMeter slaves are hosted on different load generators and one JMeter master will be there, so which will be the master node, and this master node will be connected to any of or more than one slaves, and then it will start taking any of the slaves randomly and then it start executing the test in any of the slaves, and it will start connecting to the respective database, and then it will also start populating the data to the Grafana server which will be visible to the user. So here in the application servers if any kind of issues it is found, like if the one slave is going down where your real-time test is executing, so here also the replay framework will come into picture, which will take care of any kind of failures in the slaves, and that particular replay framework will start populating the data so that when anytime if the slaves comes back, it will again connect that particular failure point, and then it’s start capturing the data from that point. So it’s kind of a Pub-Sub model you can say. So this is the server performance monitoring model for the framework
Subhodeep Ganguly: 00:36:33.949 Yeah. So here comes the slide where we came across of a requirement, why do we need to develop a replay framework. So here, actually what this slide tells that what kind of use cases it can solve. So suppose in your application, if you face any kind of database failures or network failures or any kind of connectivity failures between your application under test or any disk space failures or any kind of other failures which affect your execution, so if you face these kind of failures, will you be able to continue? So suppose you’re running your test for an “N” hours duration. So if you get off any of these errors, how do you continue? If you face any kind of these errors, how do we continue our execution? So do we have any model which can recover any of these transactions for your automated testing. And also, we have considered some of the situations where the test execution is time- and resource-costly. What I mean to say here, that your test execution is running for three days, holiday volume test, as I mentioned, and also the resources are involved for executing these tests constantly for these three days. So one resource is constantly monitoring one test or more than one test. So suppose at any point of time of this three days duration, if any of this particular infrastructure goes down, then what will happen? Should I need to again start from fresh? Can I pause these kind of failures or can I pause any possibilities for these kind of issues? Now, as I mentioned already in the third point, should I start the same test case again from scratch? Is there any alternative? Now, in the next point, there is also a possibility, suppose you’re starting the test case again from scratch. So do you have any guarantee that I will get the accurate result in case of any meet such failures again? Do you have any guaranteed that these failures will not come? And if it comes, then do you have any architecture which can take care of these failures? And if this failure still come, you will be able to get your test result as pass, fail, or the full transaction to validate your performance of the application is stable? Do we have any kind of framework for that? So that is also one of the use case.
Subhodeep Ganguly: 00:39:23.936 Plus, during any time, if any of the infrastructure issues as I mentioned in point one, if it comes, so you won’t be able to view your run-time results. But this is very much required that you have to validate your run-time results at every point of time throughout the duration. So can you save that time? Can you save that effort? Can you save the effort of the resources so that you get the run-time result without any issues, without any breakage? So that is you have the questions which came up to our mind which actually for that, we actually developed this particular framework.
Subhodeep Ganguly: 00:40:08.541 So I see some of the questions are coming. So I will be able to answer this after end of my presentation. Hope this will be fine. So these are some of the use cases which actually came across to us. For this, we found requirement to develop this kind of framework, which will be efficient for any large-scale projects.
Subhodeep Ganguly: 00:40:39.259 So here is the automated solution approach which we have used there. So you can see here, so this is a JMeter performance framework. So you can see here InfluxData is streaming the result to Grafana, and InfluxDB is getting the data from multiple monitoring agents and here comes if any issues in the infrastructure, so these links will be broken and you’ll be able to view null values in all the applications and your effort is gone because when your testing framework is very resource and time costly and also as it is comes for the sake of performance testing, you have to verify each and every spikes in your results in your dashboard. So at any point of time, if you see there is a spike is gone or if any breakage in the graphs, so that particular aspect, your performance analysis will be not much effective. So here comes the requirement that you have to do an automated solution which can take care of these particular issues. Now, here we are seeing that we have followed a model as same as developed by Kafka. So as Kafka is a Pub-Sub model, so we tried to create a structure like Kafka. So here, actually, we have tried to use a queue concept and message broker concept and also a data store concept which can take care of all these infrastructure automated solution seamlessly. So why we did not use Kafka? Because Kafka and Zookeeper, again, these are server monitoring model. You have to manage a specific server to manage Kafka and particular Zookeeper. So suppose that infrastructure is also not working, so what will happen? So we wanted to minimize any [inaudible] load for developing another server. So because our requirement was quite simple that in case of any issues in the infrastructure, how do you guarantee that all the teams will get the accurate test results? So we followed this model so that whenever there is an issues in the infrastructure, it will invoke a replay framework, which will take care of all these data processing. It is horizontally scalable, and also, it is fault tolerant. And in that particular model, it can save the particular test results, and it can connect to the database system whenever the database comes up. So it kinds of a topic queue we have implemented using this particular replay framework model, which actually whenever any issues in the infrastructure, all these consumer threads subscribes to this particular queue, and then it starts pointing the data processing model to this particular queue, and whenever this particular infrastructure comes up, then again, it consumes the data from that particular queue.
Subhodeep Ganguly: 00:44:11.168 So it’s kind of a flowchart. This is the real working model of our infrastructure, if you can see here. So here, you will be starting your execution, and it will download your performance framework. So initially, it will check whether your database is down. If it is not down, that is the flag will be false, then it will check whatever the data is there in your server, if it is more than a specific days old, if any data is existing which is already executed and the test results is ready and there are no issues in the infrastructure, so that means we do not need those old logs. So it will delete the data. So as I mentioned that you have to manage the infrastructure with a limited capacity, so we cannot store very old logs in infrastructure. If our test execution is already over and the replay logs is already processed, then it will first check whether these logs are really important. If important, then, what is the time duration? If it is more than the time duration, it will delete the logs, and it will continue, and it will create this particular infrastructure, and it will create the file system, and it will start the queue. And then, it start writing these transactions in this file system. Now, it will again check at any point of time, so this loop will be executed during the execution of your execute - during the execution anytime, whether your database is down, so here I have mentioned, check if InfluxDB is down. With this I wanted to mention if any of the infrastructure is also down, that also it is checks. So it triggers a logic here. If this particular database or any infrastructure is down, then it will set this particular queue and then it will start invoking the same logic up to three times. If it is not able to connect to the same model as to the same log processing model, then it invokes a replay framework which takes care of fetching the logs from that time, and it continues. The performance result is going by another thread and the replay framework is taken care by another thread. So your performance test results, if you can understand, this is not affected. The performance test result is continuing and your replay framework which invoked by this particular log file, which actually takes care of taking the data for your transaction logs, and at the same time, it also checks, at any point of time if your database has come up and if all the infrastructure is okay. Also, it stores the downtime. When the infrastructure was down, from that, till the infrastructure has come up, it captures the downtime so that it can give an intimation to the users that the infrastructure was completely okay. There is no downtime, or there is a downtime of 5 minutes or there is a downtime of 1 day. Suppose, if it is more than 1 day, it was not available for 1 day, so it stores the downtime, and then it captures all these records. And then once this dB, again, it comes back, so it starts consuming the data. So the producer thread has created this replay framework, and it starts flooding the data to this particular framework. And then, once your infrastructure has come back from its failure state, it starts consuming the data, and it’s again pushes the data back to the InfluxDB database. So you can understand. So your log monitoring and your dashboard monitoring system is constantly up. Whenever your Grafana or your Kibana - mainly, it’s the Grafana. So Grafana fetches the data either from InfluxDB or with replay framework. So whenever this particular framework is identifying a failure in the system, it’s points the back end to this replay framework. And it works in the real time model. You are not losing any data. Whenever the infrastructure comes back, it just pushes the back to the real infrastructure. So in this model, it works. And lastly, the users get the result in the dashboard. So it’s a replacement of Kafka, as I can say, which is developed by LinkedIn. So it works for any performance framework, or it can work with any kind of database. So we have developed this particular framework in such a mobile and adaptable manner so that it can be adaptable to any databases like a time series, RDBMS, or any kind of open source databases, it can be adaptable, which used for storing your test result or your transaction logs. So this is the overall flowchart of this replay tool.
Subhodeep Ganguly: 00:49:23.587 So what are the benefits we get? So it’s an elimination of possibility of any transaction or data loss for your automation test execution and also real-time performance analytics tools. It reduces the efforts and hours for completion of your execution of any test in case of any failures. So you are getting full, real-time data log, like your database. So in case of any issues, if you want to replay the data, you can use this framework, and it can just start the queue, and it will start the producer-consumer model, and it will create your infrastructure ready with the already executed data. So it eliminates the duplicate test execution effort. As I mentioned, suppose you’re executing a test for 3 hours or 3 days, it does not matter. It will capture the logs until unless you are having capacity in your server. And the capacity of the server is managed by our automated disk space management tool, which always keeps your server ready for consuming data. So with this approach it enhances the overall test execution efficiency, and it fixes any kind of DB failures with minimal efforts. Also, it will act as a high-accuracy model for metrics generation ensuring you can compare your baseline and comparison test without any mismatch in the data. In case of any transaction failures, the test result will show that there are issues in the execution. So with this replay framework, we are minimizing that particular possibility. And last but not least, this tool can be used as a plugin. It is an open source. So we will try to make it open source. So this tool can be used as a plugin with any time series or RDBMS without minimal modification.
Subhodeep Ganguly: 00:51:28.000 So this is the overall framework structure and the replay framework which we have developed for the project, and as it’s the replacement of another replay Pub-Sub model of Kafka. So these are some references you can go through later just for information. Yeah, so the InfluxDB team can tell more about this particular slide.
Caitlin Croft: 00:51:58.338 Thank you. So that was a fantastic presentation. It looks like there’s many questions. So I’ll just quickly go through this. Just want to remind everyone that InfluxDays is coming up in November. It’s technically our North America edition, but as I mentioned before, anyone and everyone can join. It’s completely free. So the call for papers is still open. So if you have a really good talk that you want to share with the InfluxDB community, please feel free to submit it as well register for the awesome event. And there will be a flux training about two weeks prior. So all right. We’re just going to jump into some of the questions here. A couple of people asked, “Is the performance testing framework available on GitHub?”
Subhodeep Ganguly: 00:52:52.212 Yeah. So I am not able to see who has asked these questions.
Caitlin Croft: 00:52:58.834 Okay.
Subhodeep Ganguly: 00:53:00.734 Yeah. Okay, so to answer this, yes, this is an in-house model. So we have not open source it. So it is a property of SAP as we have developed this framework for in-house performance testing. So if it allows for open source it to GitHub, we’ll just see that, but as of now, this is not available in public GitHub. It is only in the private cloud.
Caitlin Croft: 00:53:29.176 Okay, perfect. And the slides will be shared on SlideShare later today. So those will be available for review later. Can we see transaction names in Grafana via InfluxDB?
Subhodeep Ganguly: 00:53:45.866 Yeah, action names means transaction names, right, if I’m not wrong?
Caitlin Croft: 00:53:50.667 Pardon. What was that? Transaction names?
Subhodeep Ganguly: 00:53:53.728 Yeah, yeah. So you will be able to see the action names, whatever is stored in the InfluxDB. So definitely, you can see in Grafana if your performance framework model is working on any of the actions which you are monitoring, so those actions will be saved in the database. And definitely, we have to write a query to visualize those actions in the Grafana. So you have to write an efficient query so that that particular actions are visible on the Grafana. So definitely, you can do that.
Caitlin Croft: 00:54:26.234 Okay, perfect. Do you experience high response times reported when you push test results from InfluxDB to Grafana via JMeter local execution, singleton JMeter local execution?
Subhodeep Ganguly: 00:54:48.146 So response time will be - so the response time is managed by JMeter. So during your execution if the throughput is high, then the response time will be definitely high. So the to and fro calls, if it is taking more time, so response time will be definitely high. So that depends on your application under test, whatever the scenarios you’re creating. So if that particular scenario, to execute that particular scenario, if the response time is high, definitely, that will be visualized in the Grafana model. But InfluxDB and Grafana does not cater to the high response time as I know. It depends on your test plan, the transaction and how your system performs. Yeah?
Caitlin Croft: 00:55:44.939 Perfect. Perf Qual is your performance qualification framework based on Java. Can you please share more details about it? Have you built your JMeter master-slave mechanism, and how are you controlling JMeter slaves at run-times? Are the containerized JMeter slaves?
Subhodeep Ganguly: 00:56:09.208 Okay. May I know who has asked this question?
Caitlin Croft: 00:56:12.458 Unfortunately, I don’t know who asked it.
Subhodeep Ganguly: 00:56:15.936 Okay. Yeah. So definitely, if they’re having these queries, so they can write back to us. They can write back to me, and definitely, I’ll be able to share more details on this particular query - on this particular architecture.
Caitlin Croft: 00:56:36.736 Okay. So for the person who asked that, you should have my email address. So feel free to email me and I’m happy to connect you and you can dive into that a little bit more deeply. Do you have a single dashboard for JMeter test results and all other system metrics in Grafana?
Subhodeep Ganguly: 00:57:02.084 Yes. In Grafana, we are visualizing all these metrics monitoring in real-time basis. So we are having a single page dashboard for all real-time metrics. During our execution, whatever the regulated metrics are there, these are visualized in one page only.
Caitlin Croft: 00:57:24.738 Okay, perfect.
Subhodeep Ganguly: 00:57:25.544 So the answer is yes, we have. Yeah.
Caitlin Croft: 00:57:28.947 Great. Replay tool works on all infrastructure components in your system under test, or is it limited to the JMeter’s master slave plus InfluxDB?
Subhodeep Ganguly: 00:57:46.345 No. As I mentioned, we have implemented this replay framework in such an adaptable manner so that it can be used to any kind of automation testing or performance testing model. So based on the requirement, we can fine tune the logic inside the particular replay framework, and then it can be used to any kind of large-scale automation or performance testing project. It’s not limited to JMeter only.
Caitlin Croft: 00:58:14.190 Okay, great. How do you maintain test data from different systems?
Subhodeep Ganguly: 00:58:24.910 So test data, we actually keep the test data in the test plan. So test plan takes care of capturing all the test data, and it actually executes your test scenarios based on the test data. So it’s based on the test plan.
Caitlin Croft: 00:58:48.368 Okay, great. And then someone else asked if this session has been recorded. Yes, it has been. It will be trimmed and available for replay later today. So by tonight it will be available to be watched as well as you can review the slides. So just go and check out the webinar where you actually registered for the webinar later tonight and the recording will be there available for replay. Thank you everyone for attending this webinar and submitting some fantastic questions. Hope to see you again at another webinar and thank you Subhodeep for presenting.
Subhodeep Ganguly: 00:59:33.182 Yeah, thanks all. Thanks for attending. Thanks, Caitlin. Thanks to the team.
Caitlin Croft: 00:59:36.882 Thank you, everyone. Bye.
Subhodeep Ganguly: 00:59:39.115 Thanks.
[/et_pb_toggle]
Subhodeep Ganguly
Senior Developer - Performance Engineering, SAP
Working with SAP and having sound knowledge in implementation of Automation Methodologies, Framework Design And Framework customization using different types of Design Patterns. Worked on Development of in-house end-to-end performance testing framework using Java 1.8 and involving all ELK and TIG stack tools and JMeter as Load Testing tool. Involved in performing Feasibility Analysis, Test Estimation, ROI Calculation, Test Strategy Development, Test Execution for different types of web-based applications. Very Good working knowledge of Automation Tools like HP QTP(UFT), Selenium, Robot Framework, HP LoadRunner / JMeter as performance test tools and HP ALM.