How Switch Uses InfluxDB to Feed Site Artifacts into a Machine Learning System

Session Date: Sep 04, 2018
Time: 8:00am (PT) | 3:00pm (GMT)

Switch’s automation technology helps issuers recapture and increase card revenue by keeping cards Top of Wallet®. Using best in class security protocols, Switch discovers where cards are used for payments, navigates to login pages, and adds new or updated payment cards on behalf of each user. Understanding how these tasks are performing for the end-user is important in ensuring they provide an effective solution. In this webinar, Gary Tomlinson, Director of R&D at Switch, will share how they stream anonymized crowd-sourced site artifacts into InfluxDB for analysis and resolution.

Watch the Webinar

Watch the webinar “How Switch Uses InfluxDB to Feed Site Artifacts into a Machine Learning System” by filling out the form and clicking on the download button on the right. This will open the recording.

[et_pb_toggle _builder_version="3.17.6" title="Transcript" title_font_size="26" border_width_all="0px" border_width_bottom="1px" module_class="transcript-toggle" closed_toggle_background_color="rgba(255,255,255,0)"]

Here is an unedited transcript of the webinar “How Switch Uses InfluxDB to Feed Site Artifacts into a Machine Learning System”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.

Speakers:

Chris Churilo: Director Product Marketing, InfluxData
Gary Tomlinson: Director of R&D, Switch
Katherine Chavez: Director of Marketing, Switch

Chris Churilo 00:00:00.391 All right. It’s three minutes after the hour, so we’ll get started. And good morning, good afternoon, good evening to you. Thank you so much for joining us on our webinar this morning. And today we have a webinar from one of our users called SWITCH-or a company called SWITCH-and we have both Gary and Katherine that are going to be reviewing what they actually build and how they’re actually using InfluxDB. I want to remind everybody if you do have any questions, please put your questions either in the chat or the Q&A, which you can find those buttons in the bottom of your Zoom application. And if you really want to speak to them at the end of the session, I can unmute everybody. We can have a conversation that way. I also want to remind everybody that this session is being recorded. After I do a quick edit, then I will be sharing the recording with everybody so you can take another listen to it. And with that, I’m just going to hand it over to Gary and Katherine. Off you go.

Gary Tomlinson 00:00:59.906 Thank you, Chris. Katherine, would you like to lead off for us?

Katherine Chavez 00:01:03.587 Yeah. For sure. So as Chris mentioned, our name is SWITCH. And you can learn a lot more about what we do and get more information at www.switchme.com if you’re interested. We’re a Seattle-based startup with 18 members. Most of that team is developers, and we’re actually working out of the Pioneer Square part of Seattle, which is pretty big in the startup space. So we are a startup. We started in 2014. But most of that time since then, we have been slowly working on the platform that we created. And what we do is we design and build robotic process automation solutions capable of user-assisted placement of payment cards on sites. So the idea behind that is that there are over a billion credit cards that were issued just last year, and what we’re trying to do is make sure that-as cardholders, we all have to replace our card on every single site if we get a new card. So we have to get the new card, activate it, then go out to a bunch of sites and put it on those sites every single time we get a new card. And with fraud and everything like that that’s happening, we’re all getting our credit cards replaced a lot. And our online ecosystem is getting more and more complex with the number of accounts that we have. We buy our groceries online, we buy our clothes online, we pay for everything that we have and for our utilities online. And a lot of times, we do that with credit cards. So what we wanted to do is sort of just simplify that, and make it so that it’s less complicated to get those cards back on file. Our customers are financial institutions and issuers, and our potential partners are financial technology providers. Significant revenue is lost by card issuers and merchants when reissuing events occur and replacement cards are inactive but not used. And with the growing number of cards in every wallet, online merchants are storing on file. The tasks of managing cards to site relationships is really complex, as I mentioned. So just technology allows issuers to capture lost revenue and enhance convenience and security for their cardholders. We also have a platform that helps issuers get their cards into use immediately and automatically, to update online payment profiles, and activate new or replacement cards. The technology provides visibility to track and increase usage of newly-issued cards.

Katherine Chavez 00:03:42.830 We have a few products, as you can see. We have an ongoing beta product that is available direct to consumer. You can access it on our website, switchme.com. It’s our full-featured app and gives access for users to really get all of the benefits that we can provide from the platform that we have in an app. But it’s a web-based application with an extension in Google Chrome. And like I said, you can use it at any point in time if you wanted to sign up for the beta. It helps you log in to your online accounts, keep your passwords, and manage all of your payments online. We also, more recently, in the most recent product that we launched was CardSavr API, we packaged the platform that we have in an API to make it easier to provide it to issuers and to any financial technology partner. And the SWITCH platform already supports thousands of online merchants without individual opt-in integrations by using anonymized crowd-sourcing techniques-which Gary will get more into-coupled with a machine learning engine. The number of supported merchant sites that we support grows every day, and the underlying technology is packaged in an API or an app. So just wanted to make that clear. We try to make sure that in the API version, we give issuers or our financial technology partners full ability to manage the user experience 100%. And Gary can kind of build on that and get more into the depths of our technology itself.

Gary Tomlinson 00:05:36.987 Thank you, Katherine. Yeah. So life began here at SWITCH trying to figure out how to do this stuff with machine learning. And like Katherine said, we started about four years ago. I’d say the first two years were mostly trying to sort out how you would approach this problem. So whole bunch of approaches were taken when the company was very, very small, while this was going on. And it led us into this TopWallet application, which really is a consumer-oriented app, where the person has a lot of control over their environment to do all this. But it can be branded for various financial institutions. And so we have been working with some because they kind of like this TurnKey app approach. And so this diagram that I have up here, I’m going to try and explain sort of just the big picture view of how this system works, as I’ve listened to a number of the webinars for Influx and really appreciate Chris reaching out to us. So the way that-look on the left. There’s a TopWallet application on top, so that’s the consumer-oriented one. That’s the one that runs in Chrome. We also have some stuff running on mobile, as well, that should be coming out here in a bit. And then, below that, I showed two others. So financial institution apps, these would be the CardSavr style. So we’ve also packaged it not only with an API but with an SDK, where some of the capabilities that are in TopWallet can be built into financial institution apps. So for example, we’re working with some of the large card issuers in the United States to be able to integrate their systems. And actually, CardSavr is an outgrowth of that. So in the beginning, it was just kind of the TopWallet. And a lot of financial institutions, the card issuer said, “Hey. It’d be cool if we could have APIs, where we could do things. We already know who all the cardholders are, we have relationships with them, we know all their billing addresses. It would be really great if we could just digitally provision all that to them, notify them, cooperate with the cardholders to describe their merchant site ecosystems, and be able to basically assist in getting the cards updated, and billing addresses and whatnot.”

Gary Tomlinson 00:08:09.375 And then, on the bottom, this is kind of a newer one that’s happening. Merchants actually have reached out to us. Some of the larger merchants in the United States. They have their own co-branded cards. You’re probably familiar-Amazon has their Prime, and that’s co-branded with Chase, right? There’s other people, Cabela’s, Walmart, Nordstroms-let’s see. Who’s the travel agency? Expedia. The list goes on and on. So we’ve been approached by a number of merchants saying, “Hey. We really want some help, too, because we’ve got a lot of cards.” And there’s several large flips going on right now. Walmart is a big one that’s coming up here in a bit that you might be reading about. There’s some other big flips. Cabela’s has a big flip that’s going to be starting. Anybody that’s a Cabela’s member probably is aware of that. And there’s a number of other flips that are on the horizon. So oddly enough, we even have merchants that are now cooperating with us. So if you look up on top, you’ll see that TopWallet has an arrow going to InfluxDB. So this is actually kind of the main theme today. That’s why Chris reached out to us.

Gary Tomlinson 00:09:29.744 So early on, one of our strategies to learn about sites-now, how would you go about figuring out, “Well, I wonder what sites people use that would be of use?” Right? I mean, if it’s just you against the whole internet, that’s kind of daunting. So we realized that we could use our users. If we anonymize their data, we could crowdsource that and learn about sites that they go to that they have commercial relationships with. And so the idea is that when users on the TopWallet application are cruising around and running into sites and sites that they log into, we can notice that. And then, we will start to stream in information. I’ll go into how that information works in a little bit. But the flow here is we stream that up into-and it’s a hosted InfluxDB, so it’s the cloud version. And so we stream all that up. Then our machine learning systems-well, they’re all over the place. But part of them are running-well, essentially, all of our stuff runs up in Amazon’s cloud. So we’ve got an Elastic Container Service cluster of a whole bunch of machine learning systems that run in their own virtual private cloud. And basically, they analyze data that’s coming in from InfluxDB. And they basically learn about the systems, push that into a sites directory, which then is feeding our main service. So the CardSavr service itself is behind the scenes, running in-well, it depends. So there’s an AWS per tenant. There’s one for TopWallet itself, where we manage all that for the-well, we manage the service for the end users. And then, if you’re running with financial institutions or merchants, those are coming into their own virtual private cloud tenants. So they’re logically all isolated from each other. We have to do that for PCI compliance. And so basically all the apps-the TopWallet app, the financial institution apps, and merchant apps-they all communicate with the CardSavr back end, which then has-basically, what it has is a whole bunch of, well, virtual browsers in support of financial apps and merchant apps. In a TopWallet case, there’s no virtual browser necessary because the browsing application is built into the TopWallet app. So it does it on its own behalf. And then, essentially, the remote process automation, which runs in the virtual or the real browser, interfaces with the merchant sites in behalf of an end user. So they can log in, impersonate you, move around, update your billing information or your card information, or they can replace a card. Essentially, all of that. So anyway, if you kind of see the big picture, I think the hosted InfluxDB actually runs in Amazon, too, if I’m right. Is that right, Chris? I think that’s true.

Chris Churilo 00:12:49.176 Yep. It does.

Gary Tomlinson 00:12:50.551 Yeah. So I probably should have drawn yet another little box around that, too. I just didn’t think about it. So that’s kind of a big picture. So let’s walk back down. So we had some needs, right? And so when we first started, we didn’t know what to do. Right? It’s like, “Mm. This is a good idea. Let’s figure this out.” So we just discovered as much information as we could. And so we had a lot of unstructured data, and we were using some classic mining techniques, and we kind of realized those really are not the right approach after a little while. Just streaming lots of information and sifting around for it, that didn’t work out so well. So then we started to look at what data would be useful-right?-of everything that we were streaming. And then it became somewhat obvious to us. So the data of interest to us was we wanted to have site navigation for machine learning and then anonymize that. Obviously, we don’t want to have any personally identifiable information in there. We just want to know, of all our users, where are they going-especially, what sites are they going to-that would be possible for us to aid them, using remote process automation? And so that is actually the key part. And then, another thing that we also realized that would be useful for us is error handling data for troubleshooting. Because if something doesn’t work right, it’s nearly impossible to reach out to a consumer and say, “What did you do?” It’s the kind of thing today if you see like, “Hey. This just blew up. Do you want to push a button and send this information?” Microsoft does that, Apple does that, others do that. It’s because they can’t really call you or reach back out to you, so they want to get stuff snapshotted. And so that was something that we were interested in doing ourselves. So those were kind of the two areas that we realized of all the stuff that we were streaming originally, let’s just strip that down to just stuff of interest. So number one would be information for our machine learning system. And number two would be information that could help us troubleshoot if someone runs into trouble.

Gary Tomlinson 00:15:16.499 All right. So where did we begin? Well, like a lot of people, we said, “Hey. Let’s just go get some open-source.” “Hey, the Elk Stack is very popular. Let’s try that.” And so we got Logstash, and we put a Logstash server up and hosted that on Amazon, directly on Elastic Computing 2-so that virtual machine style-in Amazon. So we created an EC2 instance, and we put a Logstash into Linux and set it up. And then, we set our TopWallet-or more of experimental TopWallet, back then. “Hey. Let’s just start streaming.” And we named it the Massive Event Log, or MEL, because we didn’t-well, because we’re a bunch of geeks and we didn’t know what to call it. So MEL began life, and we just started streaming everything, all kinds of stuff. Right? “Let’s stream it all over there.” And then, as we started to analyze it, that’s when we realized, “Hmm. That’s probably way too much stuff.” Right? So then, we stripped it down to just the two that I mentioned before, the anonymized navigation information to places on sites-and I’ll talk more about that. That’s kind of interesting how that works. And then, “Hey. Let’s go perform some load testing.” So we started to run load testing on Logstash, and we were extrapolating for how successful we were going to become, and how many consumers would run our app. And quickly, we crushed our first Logstash instance. And so what did we learn from all that? Well, number one, it was really hard to get the learning artifacts and, actually, the troubleshooting artifacts out of a Logstash style. You can do that, but it was a little bit-use your Elasticsearch, and you go strip everything. But it actually was not that easy to get at that stuff, and we were like, “Hmm.” And then, we realized quickly that this wasn’t really going to scale for our needs. And I’m not going to say that you can’t make an Elk Stack scale. I’m sure you probably could, with enough expertise and resource behind it. But this didn’t really fit our strategic model, is what we came quickly to realize. Which was you weren’t just going to set up one simple Logstash server and go at it with Kibana and Elasticsearch. And I think, as Katherine mentioned, we’re up to a massive 18 people now, so we’re not a really big company. And you’ll see here in a bit, some of the stuff we’re doing is really of quite large scale. And so, clearly, early on, we made a decision in the modern age and said, “We’re going to go to the cloud, we’re going to leverage the cloud, we’re going to leverage as much open-source as we can. We’re not going to over-invest in non-core intellectual property differentiation. Let’s put almost all our energy into what is it that we do that’s kind of the magic? And the magic is the machine learning and the remote process automation for people. Right? And not building large Elk Stack infrastructure on Amazon. So that kind of put an end to, probably, going down the Elk Stack.

Gary Tomlinson 00:18:50.463 So now, here they come. InfluxDB. So we’re aware of InfluxDB, and we knew that it was being used by a lot of people. And the nice thing was that Influx offers a cloud version, which also was excellent, given our strategy. Like, “Oh. This is a lot better. All we have to do is work with Influx and describe the kind of scale that we’re going to run at, and they come back and provision it all for us, and away we go.” And so we began a test with Influx, and we basically started to test it at the same transaction rates that we pushed our Logstash over. And obviously, all of you that use Influx are aware that that’s not a whole lot of-it’s not too hard for Influx to handle that. And so we’re like, “Wow. This is very good.” Right? And so we converted that from just a test to a production system. Actually, we have a production and we have a test system, both, for use with the TopWallet tenant of ours. And so what did we do? Well, we modified our machine learning system. Right now, to ingest site navigation events. And so this was a lot easier to do. Right? So that was one of the nice things we realized with the Influx system is that, being time-series-based, it became super easy. Right? So we used the SQL. Very easy. And then, essentially, we would ask for the next time quantum to analyze. So earlier on, I had shown the machine learning system running in AWS and one of our virtual private clouds. So that system-right?-we call it the Harvester. And the Harvester-right?-it’s going to say, “Okay. I’m ready to work on some more data now.” And so it says, “Oh. Where is the last point I left off in time? Okay. And now I’m going to-what’s the next time quantum I want to try and ingest?” Right? And so, essentially, we can go int and pull everything that’s been captured now in that time quantum and begin crunching away. And so this was so much easier than trying to use-what was that thing?-Elasticsearch. Right? It was so easy with SQL to do this. And of course, we were able to structure our data very nicely, too, which is also another property in Influx. So instead of just seeing it as kind of like a log-right?-we’re able to describe what our records look like. So we had the time series, so we could go from A to B in a time window. And then-very nice, we’re able to say “and”-show us the record types for navigation. Or we’ve also got-show us record types for problems that have arised. So this was a whole lot easier to contend with. And then, we’ve never been worried about, “I wonder how large of a data set can they store?” They can store really huge, so that has never been a problem for us.

Gary Tomlinson 00:22:08.021 And then, we modified our TopWallet app to stream just anonymized navigation events, as well as-I didn’t put on here, as well as things like if something goes wrong and we realized it, we actually do screen capture shots so we can actually capture what the person was looking at [laughter]. And then, we tag those and ship those in, too. And so then, when we come back through to try and resolve something, we can actually look and see what’s going on, and you’d be amazed how helpful that is. You look at it and go, “Oh. Yeah. That’s not right. I think I know what’s wrong.” And away you go. And then, of course, we leave other nuggets in there for forensics, as well. So this was kind of the move over to Influx. Now, I know everyone-machine learning is this hot topic, and I know a lot of-I’ve seen some of the other, and watched some of the webinars with Influx. And I’ve seen others are using machine learning for operational support systems to learn what’s going on, and set triggers, and all kinds of cool stuff. Our particular use case, we haven’t-I’ll talk about the operational support in a little bit. This machine learning is where life began for us. Right? And so it’s actually an interesting-well, for some people it might be interesting. So what we’re up to-if you look here, you’ve got this TopWallet remote process automation, RPA, app. And essentially, what we’re after is navigations to forms of interest, in the form of clickstreams. And so one thing that we learned very quickly was that you can’t just rely on a URL-right?-to get to places. This is not really how the modern web works. And so often, what’s happening is the apps-right?-when you click on things, they navigate around in the DOM, and they magically show you new things, or they-you can’t get to the next part of a form-right?-or you can’t move from form 1 to form 2, often, without going through the clickstreams. Right? And so it’s not at all how you might think. Like, “Oh. I’m just going to have this URL, and I’ll capture that other URL, and that’s going to get me all my information.” That’s not really how it works.

Gary Tomlinson 00:24:39.159 So sometimes we can use the RL. Often, we have to use the URL in conjunction with clickstreams. And how do we get clickstreams? People drive those the first time. Right? And so the natural act of any one of us cruising through-right?-we’re grabbing all these clickstreams. Of course, we’re stripping all the information about who the person is that did it. We just want to know they went there. And you can imagine that one of our trigger conditions of like, “Is this a site of interest?” is how many people are going there. Or is it regional? Like, “Oh. A whole bunch of people are going to this utility,” for example. Right? It’s your power company or something. It’s not like Amazon. But it’s very important to people who use that. They’re in a region. This is especially important to some of our financial institutions. So we’re working with them where they’re aware that they are going into smaller businesses. Right? Or smaller opportunities, smaller card issuers, if they want to flip, essentially, like to their bank. And so in that case, there’s lots of regional type of information. It’s not your Alexa 100 top sites. And so you can almost guess the top 50 Alexa sites that are merchants. Well, of course, you’d support them. But then, who do you support afterwards? And that’s where all this crowdsource information really becomes highly useful to us. It’s also useful to us when we’re going to even the large sites because they change all the time [laughter]. They’re always doing A-B type testing and changing all stuff on us. And they knock us out sometimes-right?-so our remote process agents don’t know what to do for a little bit. And we learn very quickly because not only do we re-scan them all the time, we also pick up these clickstream events coming through and going, “Oh. Interesting.” And then, we walk through and find what they’ve done to us, and then, we relearn very quickly. And so that all goes into the Influx database, like I mentioned. And then, basically, what we’re trying to do is get that stuff parsed down into an inferencing engine.

Gary Tomlinson 00:27:03.997 So this Harvester grabs all that stuff, analyzes the websites, the merchant sites, figures out what they’re up to, and then, generates inferencing information that we can send back and feed our RPA agents again. Our RPA apps, like TopWallet or our own virtual browsers, in the case of pure CardSavr. And so it’s kind of this little ecosystem where people help us learn about new sites in a crowdsource way. And then, our machine learning systems, they crank through, generate all the inferencing information necessary for the site, essentially trying to pattern-match them. Like, “Oh, you look like this,” or, “You look like that.” And this is actually how we as humans work. Right? We kind of learn, we click, there’s paradigms. We realize, “Oh. This is this kind of form,” or, “Ow.” Have you ever run into a form you’ve never seen before? And you go, “Hmm. That’s very frustrating.” And then, you kind of get used to it. Right? Well, that’s essentially what our engines are up to. And the more people and the more ways we come in, and the larger, essentially, the learning sets that are coming in from the cloud sourcing, the greater the inferencing becomes, and the more sites-and it often is like once we figure out how to do one site, there’s-it’s not just that one. They kind of fall into a category, and then, a whole bunch more. Right? “Oh. We figured them out.” So this is kind of our primary use case for Influx right now.

Gary Tomlinson 00:28:44.598 All right. So where are we going to go? And like I mentioned, there’s not a lot of us in the company, although we’re starting to grow. I guess I didn’t mention one thing that I think about. So we are in the-we’re in the midst right now of a large card-flip. So we’re working with a merchant on a large card-flip from Visa to Mastercard. And there’s 3.2 million cards. And so we’ve got to get all this work done in the next three months. So that’s a lot for us at 18 people. And so we have a few extra contractors coming on, and we’ll be probably adding a few more people to the company. And the pipeline is growing. And so historically, as Katherine said, we’ve been a company-almost, probably, two-thirds has been developer [laughter] and one-third non-developer, and that’s going to change now. Like we have one salesperson. So we’re probably going to have to have more than one. We have one operations person. That’s probably not going to cut it. So we’re all kind of in operations ourselves, but obviously, as a company, we’re starting to go through this maturation because now we’re becoming successful, I guess. And so we’re keenly aware that Influx is also highly useful for other things. And so right now, we have kind of some simple AWS tools. We’ve got CloudWatch, and we’ve got X-Ray, and we’ve got a number of things. But really, we’re kind of at the early stage, operational. And our goal in the early days was build innovative technology, build some use cases people care about and get successful, get some customers [laughter], and then, with a revenue stream, get our act together and become a real company. So we’re very aware that Influx and the Kapacitor-we’ve seen a whole bunch of innovative ways that it’s been used by other Influx customers.

Gary Tomlinson 00:31:07.526 And so we’ve got some ideas. The leader of our operations team has-I’ve been cluing him in, so he’s been watching many of them. He’s got a lot of ideas. So I think that’s-in terms of where are we going, I think, two things. One. Basically, building a lot more-or utilizing Influx more for some of our operational support triggers and things like that. Starting to build some machinery around that and probably using the Kapacitor, as well, for capacity planning. And then, also, we need to dig into our failure conditions and stuff, too. Because like we said, if we’re streaming in where something has gone wrong-not our infrastructure had a problem in the sense of scale, the number of systems. Nothing really failed at that level. But things are going sideways with sites. Right? Meaning that our machine learning system doesn’t have coherency with them. How do you learn all those things? And so this is an area that we’ll probably be working on here in the not-too-distant future. Capturing those kinds of events and being able to trigger on those and go grab that stuff and analyze it. So in addition to analyzing infrastructural-type problems or things such as, “Oh. We don’t have our autoscaling really at the right policies in Amazon.” One thing we’ve realized in Amazon is it doesn’t just scale instantly. I mean, it’s like you got to have enough instances running, and you’ve got to get ahead of the eight ball if you’ve got the flash crowds, start to-or maybe not-either flash crowds or just changes in behavior. Right? Like your demand rates are sort of changing a little bit, in terms of, “Are they all compressed into smaller time windows?” and things. And right now, we’re having to watch that manually [laughter] and make adjustments. And so we’re like, “Well, that’s okay for a little while, but not very long.” So those are probably areas that our operations people-that would not be me, per se. I’m more of a development person, although I’m a lot better at operations than I used to be. But professional operations people will be working on that. So I think that’s probably where we’re going to go next with Influx. But so far, things have worked out very well for us. I think that’s about it, Chris. We’re not as mature as some of your other customers.

Chris Churilo 00:33:51.479 No. I actually have a lot of questions.

Gary Tomlinson 00:33:54.523 Oh. Okay.

Chris Churilo 00:33:55.473 And I’m sure everyone else listening probably has a lot of questions, as well. The nice thing about what you guys have been describing is that it’s not the traditional infrastructure monitoring or IoT monitoring. So this is definitely, I think, something that a lot of our users are pretty keen to be able to achieve, do any kind of real-time analytics, and also, apply machine learning to it. And I have to say that you mentioned a couple times that your company is small, only 18 people. But oftentimes, when we talk to our users, they’re also just on small teams, themselves. So I’m sure they appreciate-they probably have the same kind of feeling as you do, the “Hey. We’re a small team, so we definitely have to be more creative in how we can gather all this data and be able to get some kind of insight out of this data.” So most of the people that we talk to are in a similar situation as you guys.

Chris Churilo 00:34:53.142 So a couple of questions that I have is that-it never really occurred to me how useful it can be for these various institutions to understand which sites that their users go to compared to their competitors, and which cards that are being used. I guess, one of the questions that I have is: what do they-I think one of the things that you give visibility to your customers is that-what sites a cardholder uses their card on. Can’t they get that information just from looking at the transactions? Or is there something else missing that you guys are providing?

Gary Tomlinson 00:35:41.151 Well, they can’t. So there’s different views, here. Say, a given card issuer. Right? So they do get settlement statements. Right? And so the settlement statements can be helpful, although you’d be amazed at how unhelpful they actually are. So. And when we work with them, they can see-like for their card, they can see that you used such-and-such merchant. Right? But then, often, the merchants are somewhat obfuscated. Right? So then they’re not exactly sure. Right? And so we’ve got some machinery that can dredge through their settlement statements. And then, we use some of our machine learning on that to try and map them to our site’s directory. Right? So we can give them more clarity. So from an individual card issuer, their goal is not wanting to know where their particular cardholders are using the sites. Their goal is to get their card on those sites and get them replaced. Right? So it’s not so much that they’re doing intelligence-gathering as-the greatest fear of any card issuer that we’ve learned is any kind of an event. Like if you have a fraud event-right?-and now that card’s no longer useful, and now they got to get you a new card. Well, almost everyone has more than one card from more than one issuer for that reason. Right? You’re like, “Hmm. Well, I better at least have two?” Right? “What if one gets compromised?” Right? So if it does, the natural tendency of a person is you go, “Okay. I’m going to use my other card. That’s why I have it.” Right? But once you start to use the other card, now they need to-so the one that lost out wants you to get back in there. Right? And so they’re terrified of that. Right? So anything they can do to get those out there faster, easier, and keep that loyalty is super important to them. Essentially, the act of creating a card and shipping to you is just a cost center to start out. Right? You need to use that card. Right? And so they get interchange revenue-that’s that 4 or 3% or whatever-right?-overhead. They all get to share in the [inaudible] there, and that actually drives a tremendous amount of revenue for them. And, of course, they hope that you don’t pay it off in time and, hey, they get some extra interest, too [laughter]. But if you don’t use your card, it’s not very useful to them. And that’s actually why the merchant that we’re assisting with this card-flip, they have a very large loyalty. And you’re talking large amounts of revenue on the line for them in the order of hundreds of millions of dollars in revenue-right?-per year on these cards. And so any kind of a disruption is a big problem for them.

Gary Tomlinson 00:38:41.734 Now, the other one that you’re kind of, I think, trying to ask the question. So an area that we’re working on that we don’t have it all finished yet, but-so in the case of the TopWallet-right?-where the consumer now can have cards from multiple card issuers. Right? So in that case, what every single one of these card issuers wants to know is, “Where are they in the TopWallet, and who are they competing against?” And so that’s an area where we have to have permission of the consumer to let the card issuers that they have relationships with-not other people that want to dredge. But if you have one, say, from Chase and another one from Capital One, you have a relationship with both. And maybe another one from USAA or something. Right? So obviously, USAA, Capital One, and Chase in that example, they all want to know, “Hey. Am I the top-of-wallet card? Am I the one that gets used the most?” Or, “Who am I competing against, and how would I give incentives to the consumer?” This is something they have absolutely no other way in the industry to gain visibility into. So they’re all clamoring, and they want us to get that put in for them. But we’re like, “Yeah. It’s on the roadmap. One step at a time.” Right? There’s just only so many of us that could build this stuff, and we’ve got to build in a little better opt-in model for consumers. And so there is a big convergence that’s coming up with CardSavr, where they can digitally push cards over to, even, TopWallet users and assist them, but also gain some visibility. So they’re all dying to get that.

Chris Churilo 00:40:24.167 It’s so fascinating. It’s not something that I even think about. Probably a lot of us on this webinar don’t even think about those. I still have a bunch of questions, but Michael has a question in the Q&A. So Gary, if you want to read it out loud, and then answer it for Michael?

Gary Tomlinson 00:40:39.286 Oh. Let me see if I can-how do I see it?

Chris Churilo 00:40:41.790 You can just go ahead and-I think if you do Exit Full Screen, you should be able to see the Zoom app. And at the bottom, it says Q&A.

Gary Tomlinson 00:40:55.673 Where does it say Q&A? I see the Zoom app.

Chris Churilo 00:40:58.369 If you scroll on the bottom of the app, the Menu bar will show up. And if not, I can just read it out loud.

Gary Tomlinson 00:41:07.637 Uh, I don’t see it yet.

Chris Churilo 00:41:09.388 Yeah. No problem. So Michael asks, “I’m considering Kapacitor, and it sounds like you plan on using it for future work. You mentioned that, I think, your Harvester is doing work on quantum of Influx data, and that’s homegrown. Did you consider Kapacitor with maybe Python integration for this task, and what did you decide?”

Gary Tomlinson 00:41:27.012 We haven’t actually explored that yet. That might be a better way for us to do it. I think that the Harvester was created in an era before we even were doing Influx. So that’s actually a very good idea, Michael. I think we’ll probably consider that. It probably might be more efficient than what we’re doing. It’s just we haven’t had a chance yet to spend any time on that.

Chris Churilo 00:41:56.647 Can we go back to your architecture diagram? And then, Michael, please let me know if that answered your question. If you have a follow-up question-okay. Awesome. So you have your TopWallet application, and it’s sending-maybe you can just describe a little bit about what it’s actually sending to the hosted InfluxDB instance? I imagine, of course, there’s some kind of data that is a timestamp. It has like, I don’t know, some kind of URL or some information that you’re capturing. What other things are you applying? Are you applying some other tags? Just [crosstalk]-

Gary Tomlinson 00:42:37.045 Yeah. Yeah. Yeah. Yeah. So the idea is that we’re going to send over URL. Or, certainly, that’s going to have a timestamp-right?-that’s going to get automatically added. We’re going to send over a URL. We’re going to send over click stream for current form. Right? So when you’ve clicked around, right, any of you have-essentially, almost everything that you run today has a tremendous amount of JavaScript in it that comes down. Right? And so as you’re cruising around-right?-the JavaScript knows where you’re at on the site. When you click things or you hover over things, other things appear. I’m sure you’ve all seen these things happen. Right? And when you click on them-right?-that will cause you to transition into something else. So like, you’ll move into a new part of the form. It becomes visible, for example. A lot of times, things are not visible, and then you click on something and then, boom, it’s there. Right? And so these clickstreams are super important. Right? So we want to know the URL, but we also want to know the clickstream to get to something. And then, we’ll say, “Aha. We just crashed into what appears to be this kind of form. Oh, it looks like it’s a credit card form.” “Oh, it looks like it’s a billing form.” “Oh, it looks like-“ whatever it is. So we tag all of those. Right? So inside the schema-right?-basically, we have types for all these things. So it says, “Oh, I’m on this URL, this is the clickstream that got to it, and this is the kind of form I think it is.” And it’s, “I think it is.” Right? Because in real time, we may not know. Or we might say, “Unknown.” It’s just we don’t know-it’s some kind of form, but they clicked to it and they’re-authenticated that site might be of interest. Right? And so that’s where, later on when the Harvester comes along and grabs it, it can do a much deeper analysis and sort out what kind it is.

Chris Churilo 00:44:28.474 And then, so when you’re doing this analysis, I imagine you must be doing a number of different kinds of analysis. [I imagine?], especially [as a?] machine learning-forecasting? So probably looking at which ones are the most popular or most frequently visited?

Gary Tomlinson 00:44:45.622 Yes.

Chris Churilo 00:44:46.382 Which I think is really easy for people to understand, when it comes to machine learning. But then, anomaly detection, as well. Right? Things that maybe are unusual, don’t expect, or maybe they’re-there must be a number of different things that you guys are learning from that machine learning. Can you maybe articulate on that a little bit more? Because I feel like for a lot of our audience, machine learning is a bit of a black box. Right? It sounds-

Gary Tomlinson 00:45:12.150 Well, of course. It’s like a black box. So there’s a lot of things going on. Right? So number one-right? Unfortunately, the data that’s available is not tagged very nicely for you. Like when you’re looking at an element, it doesn’t say, “I’m a password.” They can actually do that, but it’s rare. Right? And so we do some other things, like we also send in information about geometric relationships between things. And so as a human-I can say this because we’ve actually filed for a patent. So we can mention a couple things. So one of the things that we discovered early on was that since the form data-right?-the relationships of them actually matter tremendously, since, say, “This looks like it’s a four-part, four-character-wide form. That sounds remarkable like a credit card number, maybe. Right? Or-yeah. Probably a credit card number. Okay. Oh, let’s see. What else is near it? Oh. Why, look. That looks like a zip code, maybe. That looks like-okay. Yep. That’s probably a billing address.” Right? And so we also send other kinds of information in that we’ll talk about, like, where things are spatially related to each other. Right? And so that helps us a lot in our inferencing engine-right?-to sort of do a deeper analysis and go, “Yep. This thing clearly looks like it’s a credit card number.” Or, “This looks like it’s a billing information.” Or, “This looks like-“ and because a lot of times, over-or we have human assistants which can tag them, occasionally. Right? So if we don’t know, we’ll have a human look at it and say, “Oh. That looks like this.” Right? And so we can send that into our information system. So then, it realizes, like, “Oh. That’s a pattern.” There’s confidence intervals, too. Right? So when you get into machine learning, it’s like, how confident-it’s not like you’re perfectly right. You don’t say, “That’s 100%.” That’s probably not going to happen. But it’ll be like, “Is it 70, 80, 90?” Right? And the more it looks like things we’ve seen in the past-right?-and the more they’re spatially oriented towards each other, the higher the confidence is. So there’s lots and lots of analysis that’s going on. Right? And the larger the data sets in any machine learning system, the more accurate they become over time. Right? Which is why this crowdsourced-right? The more things we see, and they begin to look like each other-right?-and there’s-let’s face it. Web designers, they all-well, not all. But a lot of them start to use patterns which are similar to each other in their user experience layouts. Right? And so that’s what we start to key on. Like, “Aha. We’ve seen that. That’s like UX42,” or something like that. Right? And so then, we say that’s a confidence level of 92% or something crazy. Right? And then, we have-for us to actually be confident-right?-we can set thresholds of how high does a confidence have to be-right?-before we’ll kind of go, “Yep. We’re going to go with that”? And so again, it’s-the trick in machine learning is to gain as much data as you can. Right? But instead of just trying to sift through massive amounts of just unstructured data looking for a nugget, we have a lot of context-right?-running with our app. So that helps us a lot, paring it down so it’s kind of a manageable amount of data. And then, Influx makes it pretty easy with the structure that we can apply.

Chris Churilo 00:48:50.672 So a couple of really easy questions. So how are you ingesting data into InfluxDB from your TopWallet? Are you using one of our client libraries, or-?

Gary Tomlinson 00:49:02.325 Uh, yeah. So we actually-I didn’t really show that. It technically doesn’t go directly to the hosted Influx, this picture here. It actually goes through an intermediary. So we have a CardSavr API, which is actually taking the streamed data and then using your library to put it in.

Chris Churilo 00:49:22.264 And then, what’s the volume look like?

Gary Tomlinson 00:49:24.306 Well, the volume is-not as high as you would-so originally, we had these aspirations that we were going to get to-TopWallet was going to take off and change the known world and everything. And then, we came to this harsh realization that trying to enter the consumer space is really hard. To get brand awareness, and then, on top of that, “Trust us. We’re really good security people. You should just trust us”-right?-“with all this data.” Right? Because everyone’s worried about Equifax leaks, or Target, or all these other horrible things. And then, on top of that, we have something that’s even much scarier that we have to store than credit card numbers. Right? Credit card information has a well-known fraud prevention system around-or fraud mitigation. Right? But they’ll give you new ones, it’s a credit card, they won’t charge you for all of the fraudulent stuff, yadda, yadda, ya. The one that’s terrifying is, “We have your credentials on all the sites you go to.” Imagine leaking those, what a disaster that would be. Right? So how in the world could we gain the trust? And so that’s where the card issuers and stuff have come in. And so building relationships with them, and with some of the major merchants, then it’s a much different story. And so the TopWallet app is in beta. It’s going to stay there for a while because we really don’t have a go-to-market. So the volume is not nearly as great as we thought it would be, and so, essentially, we haven’t really had to use the scale of Influx yet. But we know it’s there from our testing. And so I think as we gain some more financial institutions that private-label brand the TopWallet, as well as the large card issuers and merchants that we’re dealing with, then we’ll start to see those volumes go up. One interesting use case that we have going on right now is some cards issuers that we’re starting to work with, one of their greatest challenges is going in and flipping-that’s how they refer to it is they flip [laughter]. They go in against a competitor and go, “Okay. We’re going to take your business, give you a better deal.” And so away they go. And in that case, it’s all these smaller institutions, like the regional power company, or whatever. Right? The more regionalized stuff. In that case, these card issuing banks that we’re working with, they want to use this TopWallet app for their own employees to go cruise those sites and then crowdsource the information in so that we learn how to interact with them. And our system picks them up, and we can support those sites. So we’re seeing some of that behavior now, but it’s not that great. So we’re not the right guys to ask for, like, “How are you doing with massive scale?” It’s kind of like we’re building that up. But hopefully, you guys can appreciate the challenge we face in gaining a trust relationship.

Chris Churilo 00:52:38.160 Yeah. I mean, we know from listening to your talk today that the scale is enough that you definitely can’t do it manually. Right? You can’t have human beings looking at all that data. It would take way too long.

Gary Tomlinson 00:52:47.856 No. Oh, it would take forever. So I mean, and we’ve-by the way, to seed our system, knowing that we can’t always get all the crowdsource, we’ve just been rolling along on the-so Amazon offers two services called Alexa. It’s very confusing. There’s Alexa everybody knows, which is the voice recognition system. Right? There’s another one called the Alexa ranking of sites. Right? And so we’ve been using the Alexa ranking of sites to gain insight into what are the biggest sites out there. And we’ve put the Harvester out there to kind of dredge them. And then, we have some people, humans, that go out and hit them and walk through them, and we kind of learn, ourselves, going through the TopWallet. And so that’s been helping us, but we are picking up more-like Katherine said, we support a lot of sites already. Right? And so we’ve been using that. But really, it’s to gain insight into all these-the ones you wouldn’t know about. Right? That’s actually the biggest thing about the crowdsourced. The ones that are more obvious, we have mechanisms to sort of seed that up front.

Chris Churilo 00:54:06.656 Very cool. If anybody else has any other questions, please feel free to post it in the chat or the Q&A. Or like I mentioned in the beginning, if you have questions afterwards, please feel free to send them my way. And I will forward them to Gary and Katherine, and we’ll definitely get those answered.

Gary Tomlinson 00:54:24.299 Oh. And we love people to tell us, like, “Why don’t you do this?” It’s [laughter]-

Chris Churilo 00:54:27.905 Yeah. Just like Michael did. Right?

Gary Tomlinson 00:54:29.732 Yeah. Just like Michael did. We’re like, “That’s a good idea.” We’ll all learn from each other.

Chris Churilo 00:54:35.702 I think this is really great. And even though your own solutions-and I’m talking to the audience-may not be exactly like this, hopefully, you can see that being able to throw your time series data against any kind of machine learning system can definitely help sav

e time, and help you to detect those patterns that may help you understand how your infrastructure is doing, or help you find those things that are just very unusual that you might want to ignore, or you may have to do something about it. So I’ll keep the lines open for just another minute or two. And so in closing, at least for me-I’ll stop asking all these questions-where did you learn about InfluxDB?

Gary Tomlinson 00:55:23.261 Actually, so in startup land, it’s usually who you know. So we actually-several of us had a relationship with Evan Kaplan, who is your CEO. And so when we were struggling with Logstash, he mentioned that Influx might be of interest [laughter]. And so we went and started doing some homework on it. We’re like, “Oh. This is actually pretty interesting.” And so I think that’s-after we realized about Influx, and we started to look into who used Influx, that helped us a lot. First, we started to test with it. But then, when we saw some of the other folks that are using it, and some of the scale points they were using it at, we were like, “Oh. Okay. This is-yeah.” I mean, let’s face it. Could we generate a big enough load to bother Influx? Well, maybe if we constructed a massive AWS assault on it or something [laughter]. But I mean, it’s-anyway. It was far beyond what we were doing with our Logstash at that time. And then, I guess, at some point, you just trust others. Like, “Hey. These other guys are using really massive amounts of events, and it doesn’t [inaudible], so we probably could rely on it to do that.” We knew it would work for what we were doing, so. That’s actually how we learned about it, and I think in the industry, a lot of times, you hear, or someone will say, “You should look at this.” And yeah.

Chris Churilo 00:56:51.364 That’s cool. I’ll have to definitely let people internally know about that. Usually, people hear about it from Paul Dix, our co-founder and CTO. And so I think Evan will enjoy the fact that he was able to [laughter]-

Gary Tomlinson 00:57:08.334 No. Don’t tell him. He’ll get a big head.

Chris Churilo 00:57:11.781 He already has one [laughter]. All right. Well, this was a great session, and I appreciate you guys pulling this together and sharing with us a nice, unique use case, which I think a lot of our user community will find a lot of benefit from. And hopefully, they’ll be able to also apply some similar kinds of approaches to the systems that they’re managing, as well. So if anybody has any further questions, please feel free to email me, and I will forward it to Gary and Katherine, as I mentioned. And once I clean up the recording, then I’ll post it so you can take another listen to this, as well. Hey, Gary and Katherine, thank you so much. This was really informative and, actually, a lot of fun. And-

Gary Tomlinson 00:57:55.504 Yeah. Thank you so much for inviting us.

Chris Churilo 00:57:58.394 Yeah. And maybe we can hear a little bit more about the next generation that you guys-

Gary Tomlinson 00:58:05.014 Oh. Sure. We get our operational support systems up to snuff. Everybody else is probably laughing at us, “Sure.”

Chris Churilo 00:58:11.079 And then, if anybody happens to be in the Seattle area looking for a job, hey, check out SWITCH [laughter]. You never know.

Gary Tomlinson 00:58:21.846 That’s right. We’re always looking for really smart people, so.

Chris Churilo 00:58:24.653 Exactly. Awesome. Well, thank you so much, everybody, and have a pleasant day.

Gary Tomlinson 00:58:30.015 Okay. Thank you. Bye.

Katherine Chavez 00:58:31.269 Bye.

[/et_pb_toggle]

How Switch Uses InfluxDB to Feed Site Artifacts into a Machine Learning System

Watch the Webinar

Session Registration

Product & Solutions

Developers

Company

How Switch Uses InfluxDB to Feed Site Artifacts into a Machine Learning System

Watch the Webinar

Session Registration

Product & Solutions

Developers

Company

Follow Us