Meet the Founders: An Open Discussion About Rewriting Using Rust
Session date: Jun 20, 2023 08:00am (Pacific Time)
Rust is a systems programming language designed for high performance, type safety, and concurrency. According to Stack Overflow’s annual survey in 2022, Rust is the most loved language with 87% of developers saying they want to continue using it. The same survey also reported that nearly 20% of developers aren’t currently using Rust, but want to start developing using it.
Ockam’s suite of programming libraries, command line tools, and managed cloud services enable developers to orchestrate end-to-end encryption. InfluxDB is the purpose-built time series database developed to handle time series data for IoT, monitoring, and real-time analytics. Ockam was originally developed using C, and InfluxDB was originally written using Go; both solutions have been completely rewritten in Rust. Discover why two founders decided to rewrite their developer tools using Rust, and gain insight into the strategy beforehand and the entire process.
Join this live panel as Mrinal Wadhwa and Paul Dix dive into:
- Their approach to rewriting a project in Rust
- How to build and train engineering teams
- Tips and tricks learned along the way - pitfalls to look out for!
Join this webinar as there will be a live discussion with Q&A
Watch the Webinar
Watch the webinar “Meet the Founders: An Open Discussion About Rewriting Using Rust” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Meet the Founders: An Open Discussion About Rewriting Using Rust”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Paul Dix: Founder and Chief Technology Officer, InfluxData
- Mrinal Wadhwa: CTO and Founder, Ockam
- Anais Dotis-Georgiou: Developer Advocate, InfluxData
- Caitlin Croft: Sr. Manager, Customer and Community Marketing, InfluxData
Caitlin Croft: 00:00:00.756 All right. Good morning, everyone. Good afternoon. My name is Caitlin Croft. I’m really excited to have you all here today. So if you’ve been on any of our webinars you know that we like to just kick things off with a few friendly reminders. So just want to remind everyone that this webinar is being recorded and will be made available for you guys to rewatch later today or tomorrow morning, and the slides will be made available as well. Please post any questions you may have for Paul, Mrinal, Anais, whoever you got questions for, in the Q&A or the chat. We will be monitoring both and will answer all of the questions at the end. And I also want to remind everyone of all the other fantastic resources that are available to our community. So there is the Slack workspace, there are the forums, there is InfluxDB U. So if you have questions, don’t be shy. We have amazing community members and our DevRel team in there answering questions. Even Paul’s in there answering questions, so feel free to bug Paul directly if you have any questions around InfluxDB. And we’ve been really busy the last couple of years rebuilding the database using the Apache Arrow ecosystem as well as Rust. So this webinar specifically is kind of diving into Rust. Our friends over at Ockam also rewrote their platform in Rust. So I think this’ll be a really interesting conversation between Mrinal and Paul about why they chose to rewrite their respective platforms in Rust and what did that mean. Finding the right engineers with the right skill set, and also training and getting their current teams up to speed, and all that. I don’t want to take away their thunder, but that’s just kind of a sneak preview of what we’ll be talking about here today.
Caitlin Croft: 00:02:02.797 And we have Anais, who is one of our amazing DevRels here to help moderate and help ask questions and all that sort of stuff. So don’t be shy. Throw those questions in. If she wants to ask them now, she will, or throughout the webinar. And if not, we’ll answer them at the end. So really excited to have you all here today, and without further ado, I’m going to hand things off to Anais.
Anais Dotis-Georgiou: 00:02:31.766 Hi. Thank you, Caitlin, and thank you so much, everyone, for joining. Today with us we have Mrinal Wadhwa and he’s the CTO and founder of Ockam. And we also have Paul Dix who’s the CTO and founder of InfluxData. And yeah, we’re going to talk about rewriting using Rust and the advantages of using Rust and also some of the hurdles. So to introduce myself briefly, like Caitlin mentioned, I’m a developer advocate. If you want to connect with me, I encourage you to do so on LinkedIn and I can point you to any resources to learn about all things time series and InfluxData, and also, any more questions that you have about this webinar today. So yeah, please reach out. I’d be happy to connect with you there. But now, I want to take some time to actually meet the founders, and if you both wouldn’t mind introducing yourselves and kind of laying the foundation for understanding why we’re here today. Paul, if you would like to go first.
Paul Dix: 00:03:39.123 Sure. Yeah. Thanks, Anais. Yeah. So as Anais mentioned, I’m the CTO and founder of InfluxDB, InfluxData. I’ve worked at a bunch of large and small companies, but actually, I think most relevant to InfluxDB and time series in general, in 2010, I worked for a fintech startup where I basically created a time series API for financial market data. And then I did the same thing when I started this company originally as a SaaS metrics and server monitoring startup. As far as for this webinar, I think, Rust is something that’s worth talking about. So at first, I was aware of Rust when InfluxDB — sorry, when we first created InfluxDB, but it was so early that it wasn’t something I was really paying attention to. In 2018, I started becoming more interested in Rust and kind of wrote a little bit about it. And I’ll talk a little bit about that more later. So been interested in Rust and working with it since about that. Mrinal, you want to go?
Mrinal Wadwha: 00:04:50.902 Yeah. Absolutely. Thanks, Paul, and Anais, and Caitlin. And thank you, everyone, who’s joined today. Thank you for your time. So I’m Mrinal Wadwha, I’m CTO at Ockam. And Ockam started around 2018, where Matthew Gregory, who is my co-founder and our CEO, and I really became passionate about solving problems of enabling distributed applications to have end-to-end trust in all data in motion. So we set out to build this system that would enable anyone to build an application that can trust data in motion. So we’ve done that. We’ve built that. We started out building that in C and then, sort of midway, we wrote it all in Rust. And I’m sure as we dive in a little, we’ll dive into why and how. I was also aware of Rust in the ecosystem over many years. But I really only dug in in 2020, and we’ll talk more about what I learned along the way in that journey.
Anais Dotis-Georgiou: 00:06:13.535 Thank you so much for those introductions. So also I just kind of want to lay some context for everybody and ask you both what does your product do? And how can InfluxDB and Ockam be used together? Whoever wants to pick up first and share, it would be great.
Mrinal Wadwha: 00:06:37.230 All right. Since Paul has his mic muted, I’ll go first. So as I said, Ockam enables developers to build applications that can trust data in motion. So what we do is we give you tools to add end-to-end encrypted, mutually authenticated communication to any application that could be running in any environment. So your app gets end-to-end guarantee of data integrity, data authenticity, and confidentiality. And it can get that across private networks, across clouds, through message streams like Kafka. Really, any multi-hot, multi-protocol topology. All communication over these topologies becomes end-to-end authenticated and private. And we also make all the really hard parts easy to scale. So things like bootstrapping trust relationships, safely managing keys and hardware, rotating and revoking credentials or/and making them short-lived credentials, and enforcing authorization policies that may be sort of A-backed or R-backed controls. So the end result is, you can build, easily, apps that have granular control over every trust and access decision inside that app. And so the app then becomes private and secure by design because it has control over these properties. And so that’s what Ockam does. I’ll let Paul talk a little bit about InfluxDB, and then maybe we can touch on how the two systems can work together.
Paul Dix: 00:08:17.370 Yeah. So InfluxDB is an open-source time series database. We originally created it in 2013, in late 2013. And when I say time series, what I mean is not just metric data but raw event data, any kind of historical event you would want to track over time, structured, semi-structured, so even log data, logs, traces, individual event metrics, those are all things that we view as time series. So this is useful in server monitoring, application performance monitoring, network monitoring, sensor data of all kinds, consumer-grade IOT, industrial IOT, and we have customers and users across all spectrums of those different use cases. All right.
Anais Dotis-Georgiou: 00:09:15.907 Thanks, [inaudible].
Mrinal Wadwha: 00:09:16.237 To Anais’ question about how the two systems could work together, right? So as I said, Ockam sort of enables trustful communications across network boundaries or across clouds. So developers are using Ockam to securely connect applications that are running in one private network, let’s say their company’s private infrastructure or VPC in the cloud, with services that might be running in their customer’s or partner’s or vendor’s private network. And so one example of combining Ockam with an InfluxData tools is, let’s say, Telegraph running in a factory network connecting to InfluxDB running in an enterprise data center. And making that connection happen, without opening any ports or dealing with clunky VPNs or anything like that becomes really as simple as just starting two commands using Ockam. So that’s one example of how the two systems can be used together. Another example would be because we have all this infrastructure around creating and managing short-lived credentials if we have integration with InfluxData where we can take the InfluxDB API token and turn it into a leased, revocable, short-lived token. So if you’ve got a large fleet of things that are talking to your InfluxDB instance, they all can have unique API tokens rather than one sort of fixed hard-coded token, that if compromised would compromise your entire fleet. So these are sort of a couple of examples of how the two systems could be combined together.
Anais Dotis-Georgiou: 00:11:03.865 Thank you for the examples. So let’s actually get into, kind of, the meat of this webinar and ask you, why did you decide to rewrite in Rust and what challenges were you facing? Paul, do you want to start?
Paul Dix: 00:11:20.949 Sure. Yeah. So as I mentioned, we created InfluxDB in 2013. That version and all the versions up until 3.0 were written in Go. And as I mentioned, the vision for InfluxDB was that it would be the home for all sorts of time series data. Now, what version 1.0 and 2.0 could do very, very well were metric data. But because of the architecture of the database, they didn’t necessarily work with high-cardinality data and other kinds of things, raw event data, as well as we would have liked. And it was kind of a result of the actual underlying architecture of the database itself, which is essentially a time series data store paired with an inverted index to map metadata to it. So when we thought about InfluxDB 3.0, there were certain problems that we wanted to solve, long-standing problems that we wanted to solve. One was this idea of the cardinality problem, right? So if you have metadata that’s very, very high cardinality, right, and a lot of unique values, this was something that InfluxDB just couldn’t do. So we had to figure out how to solve that problem. We also wanted to offer tiered data storage. So you have data moving from in-memory to a locally attached SSD, to cheaper spinning disk-backed object store. Because we find in our use cases, people frequently have a massive amount of data they want to keep around, historical data, but most of the time, they’re only querying the leading edge, very recently-written data. So we needed a way to be able to move data from expensive storage to cheap storage that’s just built into the database. So when we looked at these different problems we had to solve, what we realized was that that was essentially a rewrite of the database, right?
Paul Dix: 00:13:22.077 The database architecture that we had built for 1.0 and 2.0 shared the exact same database architecture that just wasn’t going to work, right? So in early 2020, we started this effort — and at this point, it was basically a research effort with me and one of our other team members, and then we had Andrew join us in May of 2020. At that stage, we were just kind of feeling out, okay — I knew I wanted to do it in Rust, right? Because it’s 2020, not 2013. And I could see that Rust offered a bunch of things which we’ll talk about here in a little bit that would be beneficial to us, but I wasn’t sure what other tools we’d want to use. So essentially, we researched for six months, “Okay, we’re using Rust. What are the other pieces that we’re going to use to build this database?” But really the challenges that we’re trying to solve were — how do we offer infinite cardinality, tiered data storage, very high performance on — analytic performance on all these different time series data? And one other thing we wanted to do at this point was we wanted to bring SQL as a query language to the database, right? We had InfluxQL, we had Flux, but we wanted people to be able to query the database using SQL. So those were the problems we were trying to solve. Mrinal, do you want to talk about Ockam?
Mrinal Wadwha: 00:14:50.386 Absolutely. That was very interesting. I’m now eager to hear the rest of your journey. Okay. So in Ockam’s case, we set out to enable trust for all data in motion. We wanted Ockam to run everywhere from constrained devices to powerful cloud servers, and we wanted Ockam to be usable in any type of application regardless of the language that application was built in. So early on when we started, the kind of natural conclusion of those criteria was, “Well, let’s do this in C because that way C can compile to 99% of computer targets.” And any popular language has some sort of native function interface that we could then write a wrapper around, which would allow us to write the core parts of our library in C and then make it idiomatic in a language native wrapper for, let’s say, TypeScript or Python or something else, right? So the idea was we’ll keep the core parts of our communication protocols decoupled from hardware-specific behavior, and then we’ll have pluggable adapters for whenever we want specific hardware support. So, for example, we would have adapters to store secret keys in various different hardware security modules or HSMs, right? So we imagine we’d build our core protocols in C and because C is embeddable in these other languages, we’d be able to embed it into other languages to kind of provide libraries not only C but in all sorts of other languages, and then, we’ll have these pluggable adapters for various hardware. This would allow us to run on all sorts of hardware and be usable in all sorts of programs. That was the idea. But we also cared about simplicity, right?
Mrinal Wadwha: 00:16:52.393 It’s in our name. Ockam, as in Ockham’s razor. And so we wanted Ockam to be simple to use and easy to maintain, and this is where we really started to feel the pain with C, and C was really severely lacking. And that’s kind of what started the exploration of, “C is not quite doing what we need it to do.” And then we started exploring Rust a little bit, which I’ll talk about, I’m sure, in a second.
Paul Dix: 00:17:23.749 Yeah. So what about Rust actually caused you to look there? What specifics?
Mrinal Wadwha: 00:17:29.564 So at Ockam’s core are essentially cryptographic and messaging protocols, right? So, for example, Ockam Secure Channel is this protocol. It is a multistep protocol. It runs asynchronously. It has state. And we wanted to abstract all of that away from an end user, and we wanted it to be a single-line function call to create an end-to-end encrypted secure channel regardless of what I’m doing, right? Regardless of my communication topology, I wanted it to be a simple function call. So the first challenge was — hide all of that protocol and complexity. The second piece was cryptographic code tends to have a lot of foot guns, right? So it’s easy to make a tiny little misstep and make your entire system insecure. And we wanted to make it so that that was not possible with our code and our end users would have a safe, easy-to-use library to create these secure communication channels. And our attempts at this in C were not very successful. Pretty much every iteration we wrote, too much protocol detail was exposed to the end user. So around that time, I’d, over the years, built various systems in Erlang and Elixir, so I decided I’d write a prototype in Elixir which is a different language. And because Elixir runs on the Erlang virtual machine, it has this native idea of Erlang processes which are lightweight, stateful actors. They’re an asynchronous state machine that runs concurrently from other things inside the same program. So using actors, I was able to hide all the complex state of our protocols inside an actor and expose a very simple interface, a single-line function to create encrypted secure channel.
Mrinal Wadwha: 00:19:38.025 And then the actor would go away. It would run the multistep protocol but over the network, and then come back and say, “Secure channel created.” Right? So this was exactly what we wanted. But Elixir didn’t meet our other needs, right? Elixir isn’t very friendly for small, constrained computers. It’s not designed to be wrapped in other language native wrappers. So what we started to feel like was we needed something that is closer to C in terms of these other properties of being friendly to small computers and embeddable other languages, but we needed to write an actor system in it. And C clearly was — we were struggling with C, so we didn’t want to go take on that task of writing actors in C. So that’s when I started to look at Rust and there were several properties that very quickly stood out to me that made things interesting. So the first one was Rust libraries can export an interface that is compatible with the C-calling convention, which means that any language that can call C code or a C library can also call Rust code and a Rust Library. This means that our first objective of make Ockam usable in all sorts of languages, but only write it in one language could be easily met with Rust just as well as C, right? Then the second one was, “Well, okay, Rust compiles using LLVM.” So LLVM can target lots of different computers. It’s not the same base of computers as C with GCC and various GCC forks, but still a very large base of computers, right? And I also started seeing that there’s ongoing work to add GCC support in Rust and various other sort of targets. So I was, “Okay, this is a good bet, it’s not as good as GCC, but the other trade-offs make it interesting.” Right?
Paul Dix: 00:21:43.565 And you were seeing this — around what time was this? This was —
Mrinal Wadwha: 00:21:48.634 This was 2020.
Paul Dix: 00:21:50.700 Interesting. Yeah. So for us, one of the other things I forgot to mention about the problems we were trying to solve, there was one other big piece which is, our storage engine in 1.0 and 2.0 used memory map files very heavily. And that was something basically we wanted to take out because using memory map files in a containerized system, like if you want to run the databases inside Kubernetes, it becomes pretty tricky. So there’s again, another thing we had to change. So for me, I started looking into Rust in 2018, playing around with it. I thought it was super interesting. I wrote probably about a couple thousand lines of code creating some base interpreter language with it. But it wasn’t until the fall of 2019, when the async/await stuff landed, right, for doing async/await. And once I saw that, I was, “Okay. I think all the tools are there within the language to build high-performance server software that has to deal with a lot of incoming network connections and do a bunch of stuff.” So in whatever early 2020, when we started doing the research to see what we were going to create InfluxDB 3.0 with, the advantages I saw in the language were obviously the type system itself, the fact that errors are first-class citizen in the language. Coming from Go, I was so tired of saying, “If errors equal to nil, do whatever.” Right? I thought the way you deal with error handling in Rust was a bit more elegant. The package management system cargo — I’ve enjoyed using it basically from day one whereas I’ve been working at Go from 2012 building the precursor stuff that became InfluxDB the first version, then all the way up, and package management was always a nightmare. I think they’ve solved that since then, but I haven’t been a Go developer since 2016, so I’ve missed the later iterations.
Paul Dix: 00:24:00.108 But I really appreciated that about Rust. And for us, one of the things that we thought would be an advantage that we didn’t end up using was, at the time, in 2020, I thought that we were going to end up pulling in a big C or C++ library to do the query parser, planner, optimizer, and executor. I thought we were actually going to borrow from another open-source project for the actual query engine itself, and I thought we were going to have to pull that in into C++. And so the ability for Rust to bring that in with zero cost was appealing to me. That isn’t what we ended up doing. We ended up betting on DataFusion, which is a SQL parser, planner, optimizer, and executor that’s written entirely in Rust. And for me, even though that project was fairly early in 2020, we decided that whatever we were going to be building InfluxDB 3.0 around was something that we were ultimately going to have to develop a lot over the years. And I was excited about the idea of having that entire system being written in Rust from the ground up because of the things like the concurrency primitives that you get in the language, right? The compiler protects you against data races and all these other things. So having all that built into the language and having the compiler enforce it means that there are whole classes of bugs that we certainly created in InfluxDB 1 and 2 that you just can’t create in Rust. The compiler ensures that you don’t create those things. So all of those things, and obviously, the performance of the language itself was — for us, it’s a database, it needs to be highly performant. And I guess the last thing that I should highlight coming from Go is the fact that we don’t have to deal with a garbage collector. Right?
Paul Dix: 00:26:05.383 The garbage collector frequently is a pain in a database where you have tons of — particularly in a time series database, we have tons of tiny little objects coming into life and going away very quickly. And in Rust, it was appealing to us to not have to deal with the GC and basically to just have more control over memory management and all that other stuff. Were any of those other things important to you as you were looking at Ockam?
Mrinal Wadwha: 00:26:34.528 Yeah. Several of those things that you just said remind me of things that resonated at the time, right? So first, I already made the point about their topography code tends to have footguns, right? And the type system and being able to sort of design your invariants in such a way that they turn into compile-time errors rather than run-time errors. It’s huge, right? Because that means we’re less likely to make mistakes. It also means our users of our libraries are also less likely to make mistakes because they’ll also see compile-time errors, right? So that was huge. The bora checker was obviously a big deal because that meant that instead of paying the performance cost of a garbage collector we could, at compile time, eliminate various sort of memory and race conditions on things. So to your point, huge classes of bugs just can’t exist in our codebase. Right? It’s all about probabilities, and we’ve got a big kind of probability win here that we’re less likely to make mistakes. But the most important part in what you said that was, I think, a big reason we picked Rust was async/await. So async/await, what was interesting about it was not only that it existed, which meant other people had already done a lot of the concurrently hard work we needed for this actor abstraction we needed for our simple interface, but it also had a — async/await in Rust is distinct from async/await in JavaScript, for example, right? In JavaScript, the browser or Node.js decides how the async function is run. But in case of Rust, you can plug in your mechanism of executing async functions. These things are called async runtimes, and what’s cool about that is you can have async runtimes that are optimized for different environments.
Mrinal Wadwha: 00:28:38.590 So that the more popular one like Tokio is optimized for really highly scalable environments. But then there are alternative async runtimes - there’s one inside the Ockam codebase as well - that’s optimized for tiny devices, right? And so the abstraction for the end user remains the same, a simple worker design, which in our case is called Ockam Workers or Actors, right, and that interface stays the same. But the underlying execution gets optimized for the environment that that program is running in. And so this property of async/await and design was very attractive to us.
Paul Dix: 00:29:16.879 Yeah. I think that was one of the things that appealed to us as well, is this idea that Rust basically gives you a level of control over runtime behavior that you just don’t get out of other languages, right? Obviously, you would get it out of C or C++ because you’re creating most of those primitives yourself or adopting libraries. But yeah, the thing is, for us, we just use Tokio under the hood, but it’s nice to know that for different places, we can actually open up the safety hatch and go use some other things right? So.
Mrinal Wadwha: 00:29:54.265 Yeah. All of these properties were very interesting. So looking at all of this, I was convinced we wanted to rewrite in Rust. But then the problem becomes we’re pretty far along with C and we already had a team that was C developers or developers of other things, and so that created an interesting challenge.
Paul Dix: 00:30:20.233 How long did it take you from first playing around with Rust, thinking, “Maybe we want to rewrite in Rust,” to actually making the decision to say, “Yes, this is something we’re going to do”?
Mrinal Wadwha: 00:30:38.442 So about two months to start to feel like this is really the direction we should go in. But should we make the switch from a business standpoint? A little longer than that, right? Because it was essentially, throw away everything we did over the last six, eight months and kind of bet the company on this move to Rust. And so that was a big decision. It took us, I think three, four months to really arrive at.
Paul Dix: 00:31:05.887 How large was your existing codebase at that point, the C codebase that represented your product that you were rewriting?
Mrinal Wadwha: 00:31:14.160 I don’t know numbers, but tens of thousands of lines already, right? So we’d written a bunch of stuff already. So yeah, it was a big decision. How was the transition for you? So once you figured out, “We have to do the rewrite,” what did that look like?
Paul Dix: 00:31:36.055 Yeah. So I guess that the realization came really in 2019 that, “If we’re going to achieve these different goals that we have for the database, we’re basically going to have to rewrite it.” And then, again, my thinking by early 2020 was, “Well, if we’re going to rewrite it, I want to do it in Rust.” But for us, we had just gotten 2.0. Our cloud platform, our 2.0 cloud platform was something that we were still trying to mature. So we didn’t want to fully commit to it, right? So basically in early 2020, it was, “Okay, let’s start this up as a research project.” And we ran it like that, for up until basically November of 2020, so from say March of 2020 to November of 2020. And by, I’d say, late summer, early fall of 2020, I was, “Okay, we’ve got the tools we’re going to use. We’re going to use Rust. We’re going to build it around Apache Arrow. We’re going to use Parquet as the persistence format. We’re going to use the DataFusion SQL engine.” But even then it’s, “Okay. My plan for how to do it was form a small team who can focus only on this rewrite,” and even then, you’re talking about a rewrite of a database. So it’s not obvious in November of 2020 that we’re going to get through that project and it’s going to be successful, right? So at that point, it’s basically, I’d say, probably about 10% of our engineering effort that we’re putting towards this, right, maybe a little bit more. And we ran it like that for two years before we started looping in a much larger part of the engineering team. So we got pretty far along developing it over that two-year span before we built up the confidence and were, “Okay, this is definitely we’re going all in on this and this is what we’re going to do.”
Paul Dix: 00:33:37.541 And I’d say we flipped that switch probably last summer, is when we started bringing in a lot more of the engineering team and saying, “Okay. 2023, we’re going to launch this as basically version 3.0 of the database as this complete ground-up rewrite.” But within the company, when I first started talking about it, people were, “No, you are not going to rewrite the database.” [laughter] Because it’s definitely a high risk. And I think it’s a high risk and, depending on your situation, potentially a high-reward situation. And for us, we’re still landing the different parts of that this year, right? We had the first two releases earlier this year with another release coming up in a few months, and then probably another open-source release coming up towards the end of the year or early next. So for our part, we’re still probably, I’d say, 8 months to 12 months out from having the complete thing landed where we’ve transitioned over. But we —
Mrinal Wadwha: 00:34:51.997 Very interesting.
Paul Dix: 00:34:53.037 Yeah.
Anais Dotis-Georgiou: 00:34:53.790 Paul, too, I have a question for you real quick. Which is just that you mentioned that, at first, you were thinking about using C++ libraries in Rust and then discovering that I think, being affirmed that after async was introduced in 2019, that Rust was a really good option, really looking into it. At what point did looking at the Apache ecosystem, which was also being written in Rust, things like DataFusion and Arrow — how did that move the needle? Or were you already decided that “We were going to use Rust either way because of the advantages that Rust has by itself?”
Paul Dix: 00:35:27.900 Yeah. So I’d kind of already decided on Rust. Like I said, by fall of 2019, once the async/await stuff landed, I was like, “Okay, Rust is going to be the thing.” Because my thesis, then and now, is that this kind of high-performance server software is going to be written in Rust more and more going forward, right? And there we’re probably going to see a number of other database projects and high-performance server software that’s written in Rust, right? So basically I just had conviction about Rust by that point. And then Apache Arrow came along and, again, I researched it, I looked at it quite a bit. I know the creator Wes McKinney from data science stuff and obviously, he’s very famous from doing Pandas. So I think I wrote something about Arrow in early 2020 where, basically, part of the thesis I had with InfluxDB 3 was that a columnar database designed for the cloud, basically to work in the cloud from the ground up, right, to separate compute from storage, would actually end up being useful for time series workloads, right? Basically if you optimize the columnar database specifically for doing time series stuff, it could do time series as well as all the data warehousing and analytics use cases that columnar databases are famous for being good for. And that was kind of the thesis of the core infrastructure and architecture of the database. So essentially, in 2020 where we’re in that research mode, what we were looking at was what are the different open-source columnar database engines, and can we adopt one of those as basically the building block that we create the database around? So DataFusion was one of the ones we were obviously evaluating. We evaluated ClickHouse. We evaluated DuckDB. And again, this is in early 2020, so it’s very different now, three years later, than it was then, right? Those projects weren’t as mature as they are now, right? DataFusion included. The thing that appealed to us too about Data Fusion is, it’s written in Rust. So we get all those advantages and, also, our team who’s working on the database don’t have to switch between working in C++ or C to working in Rust, right? They could just work in Rust the entire time — which is, I think, an advantage.
Mrinal Wadwha: 00:38:14.109 Yeah.
Anais Dotis-Georgiou: 00:38:14.254 For sure. Thank you.
Mrinal Wadwha: 00:38:16.578 Actually, let’s talk about that conviction because I share that conviction. I feel like a lot more, mostly new projects are going to just choose Rust as the option, especially if they’re targeting. But tell me your thoughts — and I can add why I think there’s so much conviction or I feel so much conviction, but tell me why you think that’s going to be the future?
Paul Dix: 00:38:42.659 Yeah. I mean, I think just for all those reasons, right? The concurrency, fearless concurrency that you get from the compiler, right? The guarantees that it gives you around concurrency. The borrow checker, right? Not having a garbage collector, having more runtime control, right? The only thing I think that’s a limiting factor for the language is the fact that it is a very, very large footprint. It is more difficult to learn than a language like Go. I mean, I think that’s a fact. I personally view it as a fact. At least it was for me, it was a lot easier to learn Go than it was to learn Rust. But all those different things I think add up to create a language that is just really useful for server software where performance matters. But I think there are other interesting areas where it comes in, which is these embedded use cases that you were talking about. Being able to run on all sorts of different architectures with small resource footprints, which is something we care about in other areas as well. We want the database to be able to run not just inside Kubernetes, inside a data center, we want it to be able to run closer to the edge as well as part of our long-term vision. And I think Rust will give us the ability to seamlessly move from the data center down to edge devices just because of the properties of the language.
Mrinal Wadwha: 00:40:19.472 Yeah. I think it’s not just going to be systems that care deeply about performance, it’s also going to be systems that need to operate in constrained environments. It’s also going to be systems that care deeply about security because of these properties and because they naturally lend themselves to secure software. So I start reasoning about that. You very quickly end up with what systems are not going to be in Rust, and I think it’s going to be systems that people are okay with — they want the fast development win, right? With Go, you can build certain things a lot faster than in Rust. At least until the point where you’re comfortable with Rust, right? Once you’re comfortable with the language, I think you can get almost the same speed. But if you’re not comfortable with it, Go is so much easier to build something quickly with. Similarly, if you’ve got an environment where certain languages just work better, those are the things that I think people will continue to do in other languages. But Rust is going to be [inaudible] tool, I feel.
Anais Dotis-Georgiou: 00:41:33.564 So on that note, I also want to move the conversation a little bit forward and ask, in general, what were some of the challenges that you had to overcome to actually start using Rust? And specifically, how did you get your engineers comfortable with learning Rust? And what have maybe been some of the improvements within Rust itself that have made working and using Rust easier?
Mrinal Wadwha: 00:41:59.334 One of the biggest challenges with using Rust I didn’t appreciate enough until I really was in it, was — you need to know a lot about type design and abstraction using type design, which isn’t necessarily a Rust thing. You just need to know things that only really Hasql people cared about before Rust, right? There may be some amount of Scala folks, right? But essentially, most languages I had been using, that level of type design and thinking about algebra as related to types was just not a thing I had to deal with in day-to-day development. But with Rust, you have to go learn it, and I think there’s not enough material, even in the Rust books. And that topic isn’t handled very well. Usually, people who seem to know it, have come from backgrounds like Hasql and Scala. And so you kind of have — to design good interfaces and good abstractions, you have to go seek out books in other languages, which I ended up doing.
Paul Dix: 00:43:07.422 Yeah. I think the design patterns in Rust are a little bit different because of the borrow checker and because of the way the language works. I saw somebody ask in the chat explaining about the transition from Go to Rust. So there are a number of things with that. So one which I should highlight, we’ve been talking about how great Rust is on every level and all these other things. There’s one area where Rust suffers, particularly when compared to Go, which is the time it takes to compile the program, right? Go’s compiler is super fast. It builds a program fast. The Rust compiler is a lot slower. And there are a number of things that we’ve done inside our project in how the project is structured to try and make it so that certain compile times will be faster or whatever. And I’m not sure how good that can possibly get on the language. But for the people I’ve seen on our team who’ve made the transition from Go to Rust, I’d say it depends on what additional programming backgrounds they have, right? If they only come from Go and a dynamic language like Python or Ruby or JavaScript or TypeScript, then I think it’s a harder transition than, say, somebody who’s coming from Go but also used to be a C++ programmer. What I’ve seen, generally, is the people who used to be C++ programmers make the transition easier than just Go or dynamically-type language people, right? And I’m in the latter camp, right? I was a Rubyist for a while. I’ve written code in C++ and all these other things, but not for a decent length of time, right? Most of my career was spent in C#, in Ruby, and then Go. So making the transition, for me, I thought it was particularly difficult learning things. So what helped for me was there’s the free Rust book online, which is good. I found the O’Reilly Rust book, Programming Rust, to be very, very helpful because I read through both of those in my learning process, and I recommend the same to all of our people who are picking up those skills.
Paul Dix: 00:45:22.146 And the other thing is just trainings. So there’s a consultancy that we work with, Integer 32, and they will do trainings for your team of people, and I think that’s useful. I think having somebody around that you can just ask them questions to is super, super helpful. And then, also code reviews and stuff like that, those help, right, doing pull request reviews and having people who know what they’re doing look at the code to say, “Oh, you want to use this pattern or this idiom,” because those are the things that takes a little bit of time to pick up.
Mrinal Wadwha: 00:46:06.144 Yeah. I also had mostly dynamic languages, even though I tinkered with other stuff, and there are some things you can do in dynamic languages you just can’t do in Rust. I was talking about designing an actor system. An actor system is to route messages, and you get to these strange things where, if you’ve got a layered system and layers above know about certain types, layers below now need to deal with — so in our case, specifically typed messages had to pass through a router that had no information about those types. And so suddenly, a thing that will be really simple to do in a dynamic language becomes very hard to do in a typesetting, especially strictly typed one. And so you end up having to kind of work around these constraints. And similarly to your point about compile times, I think that’s a very conscious tradeoff in the language. They moved the runtime concern of managing memory into a compile-time concern. So now, the compile times take really long, and I’m not happy about how much time we’ve had to spend optimizing those compile times just to get everyone to be productive. So those challenges are definitely a problem. In terms of getting people in, I also found that C++ background people were able to pick it up easy. People with Swift backgrounds on our team also were able to pick it easy. Very similar kind of things exist in Swift. People with dynamic languages struggled the most, some didn’t make the transition and moved on to other things. So there was all sorts of that.
Mrinal Wadwha: 00:48:08.714 But one thing I noticed was, in the Rust community, there were a lot of people with not enough experience in general, maybe they’ve only been doing programming for six, seven years. They knew Rust really well and somehow had found Rust and had gone deep into the language, but they didn’t know other things you learn as a programmer as well. And so what we tried to do was pair people with deep Rust knowledge and other people with deeper kind of backgrounds in other things we cared about, whether it was protocol design for certain specific cases and things like that, and putting people with those two types of skills together resulted in both groups learning and getting more productive. So that worked out really well for us.
Paul Dix: 00:48:56.484 Yeah. Yeah. We did the same thing where we brought in Integer 32 as consultants to bring the deep Rust knowledge. We paired that with people who had experience building columnar databases, right? And most people who build columnar databases — a lot of those people are basically C++ programmers because those codebases are written in C++. And the thing is, like I said, this has been a multi-year effort. So now the columnar database people have become Rust experts as well over that period of time, so.
Mrinal Wadwha: 00:49:30.976 Yeah. And I think multi-year is a good gauge. Basically, it takes years to get good at this language, which is slower than some of the others. Go I think would be a good contrast. I think you can get very good at Go pretty quickly. So yeah, that was an interesting set of observations.
Anais Dotis-Georgiou: 00:49:52.142 And we have an interesting couple of questions that have come in. The first one is have any AI assistants like ChatGPT been helpful, useful, harmful, or unused during the transition? And then a second one has been — how much of your previous codebase resembles your Rust codebase, if at all?
Paul Dix: 00:50:12.519 Yeah. Well, so somebody posted an example of using GPT-4 to ask a Rust question in one of our Slack channels, our internal company Slack channels, a month or so ago. And it was incredible, the response was very, very good. It was way better than what you would get out of the search engine if you try and put some random Rust thing in and try and get an answer. But the truth is, most of the people made the transition before GPT. Before ChatGPT came onto the scene. So for us, it’s still kind of to be determined how much that’s — I mean, I assume over the next few years, every developer is going to be using AI systems to help write code because it’s kind of like, “Do you use an ID?” Right? For many people, they use an ID or they use something, Vim or Emac, configured to basically act like an ID. You use that to optimize how you can produce code, and I think AI assistants are going to have the same effect on development. As far as the overlap between the codebase, I can say there’s very little for us, right? Because we went from Go and we went from a certain database architecture to Rust and a completely different database architecture. And the people who developed each come to completely different set of people. One person who helped develop the Go codebase helped develop the Rust codebase. Well, I guess, two, if you count me but I wasn’t involved in a lot of the Go codebase from 2016 onwards, right? So yeah, there was very little — for us, there was very little overlap between our Go codebase and the Rust codebase. I mean, there was a lot of overlap in terms of user-facing functionality, obviously, because we’re creating the database and we actually implemented the 1.0 API natively inside the Rust implementation. But the way the code looks under the hood is completely different.
Mrinal Wadwha: 00:52:31.958 Yeah. That’s very similar for us as well. I think the interface design decisions we made in the C iteration, a lot of those interface choices have managed to also be similar at least in Rust, but the actual implementations are usually very different. And then to the question about ChatGPT. I earlier mentioned some of the type design, type inference, type things are hard to reason about, and I find that putting those things in ChatGPT results and explain this will result in a much better explanation than you can find in a few Google searches. So I think the time trade-off is a win with GPT-like tools. But it also returns incorrect things all the time. So just be aware of it’s a tool to get to learn some context, right, really, that’s where it’s useful, and that context might be wrong and just run with that assumption. It’s very similar to searching on Stack Overflow. It’s just that instead of 10 searches, you get one sentence, but put just as much trust in that as you would in a random web page sharing code with you.
Anais Dotis-Georgiou: 00:53:54.793 And then, also, I was hoping you guys could share some tips that you learned along the way both for Rust specifically, maybe how you overcame or reduced some of the compile time issues and error messaging. And then, also organizationally, how did you navigate engineers on board, how did you communicate this goal with the rest of the company, and maybe something that looking back now, you wish you had known or some advice you would maybe give to your younger self. Yeah.
Mrinal Wadwha: 00:54:32.767 I’ll attempt the organization questions a little bit. And this played out internally in our team and then, again, in our open-source community. So we now have 200-plus open-source contributors who’ve come and contributed to our project. And most of those people aren’t experts in Rust, right? They’re people learning Rust and exploring and looking for projects where they could learn Rust in a more realistic context, and our project kind of gives them that opportunity, right? But to make that work, what was really helpful was thinking about how can we create simple modular pieces that someone could get started with building instead of just throwing them into a complex system, right? So both internally and externally within our — internally within our team and externally within our community, we put an extra effort into designing parts that only had a specified interface, and we were, “Go implement this module,” right, “Under that interface.” And so it created this contained environment where someone could kind of experiment, learn, get a review on, get feedback, etc. And that approach worked really well because it wasn’t a toy problem, it was a real system feature but was still isolated in a safe place to learn it.
Paul Dix: 00:56:11.467 Yeah. So I could just talk about the compile time stuff, the project organization. Right. So one of the things we did is we just created just a bunch of individual crates, right, and that’s also goes to how you organize the code. If you can organize it into all of these separate modules or crates, then ideally those individual crates won’t be recompiled every single time. Right? So you can reduce some of your compile time that way. And if you look at large projects written in Rust, you see that that’s what they tend to do. What we looked at was Linkered, which was another Rust re-write. That was a project where they had version 1, and in version 2, they rewrote it in Rust. So that’s a large project that we looked at and took inspiration from as we were building out InfluxDB 3. Organizational tips from the actual org, yeah, I’m not sure. I think one of the things I would say is if you’re looking at doing a re-write — if you’re just building a project from scratch in Rust, then great. [laughter] Have at it. But if you’re looking at doing a re-write, basically the one thing I’ll say is — and I don’t know that we could have done this within InfluxDB 3, but finding some way to do incremental re-write steps first, right? So if you have a large codebase and you’re, “Okay. I want to re-write in Rust.” It’s best if you can find individual little pieces that you can re-write in Rust and replace. And because you can embed it into any other number of languages through that C bridge, you may be able to find a way to do incremental pieces as opposed to a big bang re-write, which I would say, if there’s any way you can avoid a big bang re-write, you should avoid it. [laughter]
Mrinal Wadwha: 00:58:21.446 Yeah. We did a little bit of that. We had the luxury of not having end users when we took on the re-writes, right? So that made things a little bit simpler. But we did do it in a modular way, right? So we took a modular inside our C codebase, and said, “Let’s go attempt re-writing this in Rust before we attempt to rest of the system.” And so that incremental approach — it gave us confidence at every step because we could see that the two things worked the same. Which means when we put on the next layer, we can make it work without getting too lost in the complexity. So yeah, that’s great advice. Around the build pipelines, etc., there’s a lot of work that’s gone within the community around improving that. And there’s some folks that spend a lot of time writing about how to do that sort of compilation and MCI pipelines better. So this is an active topic of discussion in the community, and almost every month, I find a new [inaudible]. For example, I found this project called Nextest, which does a way better job doing tests in parallel versus the standard cargo test command. And that made a huge difference in our sort of CI pipelines, for example.
Anais Dotis-Georgiou: 00:59:57.208 Great. Thank you. And so we talked a lot about the benefits of Rust, specifically things like the borrow checker and cryptography guarantees and optimized memory management. But can you talk about some of the benefits that you’ve seen from the move to Rust within your organization?
Paul Dix: 01:00:20.348 Yeah. So I think one of the things for us is when I announced that we were building this new core of the database in November of 2020 and that it was written in Rust around Apache Arrow, one of the benefits that we got was that I said at that time, “Oh, also we’re building a team to develop this.” And we had a bunch of really great people apply to join who we were able to hire because of the fact that it was written in Rust. Some of the people who joined were database people, and they were more interested in the database side of things and Rust was not specifically why they came to us. But they were definitely people who came to us because of the fact that it was written in Rust and they wanted to work in the language. So that was definitely a benefit that we got. I’m not sure from an organizational side, other than that, we’ve seen much. I mean, most of it is just on the core technology itself. I don’t know. Mrinal?
Mrinal Wadwha: 01:01:25.158 Yeah. I think that that point — oh, sorry, Anais, go ahead.
Anais Dotis-Georgiou: 01:01:28.824 I was just going to ask if you also want to share any benchmarks. That would be cool too. [laughter]
Paul Dix: 01:01:35.644 Those are coming.
Mrinal Wadwha: 01:01:38.518 That specific point of — experienced developers are starting to realize how Rust is a different paradigm, and it’s very good for building certain kinds of systems. And so a lot of people are looking for places to go work and build things in Rust. And so to Paul’s point, we continuously see people who are like, “My day job is this other language, but I’m very interested in making a switch to Rust. This is my context, etc.” And so we see a lot of people come in from that, and that’s good for our team. Another interesting thing that’s happened there is because a lot of people are interested in transitioning to Rust, there’s a growing community around people doing open-source contributions. So in our case, we were able to hire people who were from our open-source community. So people who were already contributing to our project. It becomes much easier to hire that way because you already know that the person is capable of navigating our codebase, they’ve already figured out a bunch of parts of it, and they’re already contributing. So it becomes a lot easier. And so I feel like because of this state we’re currently in, it’s a good opportunity to build teams around Rust because of this dynamic.
Anais Dotis-Georgiou: 01:03:08.986 That makes sense. It’s attracting people that like to think about things like type design and memory optimization and all of that.
Mrinal Wadwha: 01:03:17.914 Definitely. So before we finish or open up to some Q&A, just real quick, how have your communities responded to the switch to Rust so far?
Mrinal Wadwha: 01:03:29.974 Go ahead, Paul.
Paul Dix: 01:03:31.492 I mean, so far, I think people are pretty excited. There’s definitely some excitement about Rust itself, but I think mostly those are people not necessarily that are from the InfluxDB community but just Rust community members. I think for the InfluxDB community, people are excited more about the new capabilities of the database, right? The new things that it’s able to do versus the fact that it’s in Rust. When we first built InfluxDB, we made a big deal of the fact that was written in Go, and that was in the early days of Go. Go 1.0 was released in March of 2012. And we had the announcement that we’re creating InfluxDB in the fall of 2013. Similarly, we make a point of the fact that InfluxDB 3 is written in Rust because we want to help promote Rust as a community and Rust as a language. Particularly for this kind of software, I think, the InfluxDB community — they really care more about the actual functionality of the database than, say, what language it’s written in, but I don’t know.
Mrinal Wadwha: 01:04:47.362 Yeah. In Ockam’s case, we’re building a system that enables other people to build secure systems, right? So we’re building tools for them to build secure systems. And as more and more people learn about how Rust enables secure systems, I think we see some sort of resonance because of that. Because they’ve learned in some separate context that Rust enables secure system, and because Ockam is written in Rust, they can kind of make a gauge that, “Okay. Because of those properties that Rust lends, Ockam also is making the right decision.” So we see some of that from a product standpoint. And in general, the community has been very receptive. It’s a very friendly community. And we really enjoy kind of engaging and participating and doing things inside the growing Rust community.
Anais Dotis-Georgiou: 01:05:55.037 Sorry. One second. I’m clicking the wrong thing here. Yeah. With that, thank you so much for those answers. And we do have some questions coming in from the chat if you still have time. So one thing is — do you still have some time?
Paul Dix: 01:06:12.957 Yes, I do.
Mrinal Wadwha: 01:06:14.034 Yeah. I have five minutes or so.
Anais Dotis-Georgiou: 01:06:17.592 So the first question we have is, what does Go do well compared to Rust apart from compilation time?
Paul Dix: 01:06:32.112 I mean, I think, obviously, like I said, the learning curve for Go is much easier. I think it’s easier to get a broader set of developers contributing to your project if you’ve written in Go versus written in Rust. But again, for the core of the database, usually, you don’t have a lot of people coming in and contributing to that anyway. I think, from my experience, Go is a very productive language, and it’s easy to write a piece of server software that has relatively great performance very quickly, right? Whereas Rust, you have to deal with the language itself and the different — are you using the Tokio runtime for async/await? All these other things. But honestly, I really do believe that for software like a database, I think Rust is a superior language. I would not use any other language to create it. If I were creating a database project from scratch, Rust is the only language I would use right now to do that.
Mrinal Wadwha: 01:07:45.948 In terms of things I don’t like, I am personally not a fan of large dependency trees. And the reasoning in my mind is that a bigger library results in a smaller surface in terms of number of people you’re dependent on, right? Whereas lots of small libraries literally creates hundreds, if not thousands, of people you’re dependent on for software. And in the Rust ecosystem, they’ve obviously taken the stance to go towards lots of small libraries. And it makes me less comfortable versus, let’s say, the Erlang Elixir ecosystem, it has a tendency towards large libraries. The C++ ecosystem has a tendency towards large libraries. And I feel like there are tradeoffs there, but clearly, Rust has made that small library tradeoff which, in my book, has some negatives. [crosstalk] —
Paul Dix: 01:08:47.931 Yeah. I mean, it’s difficult. There was another question I see here in the chat that says, “What is your approach to handle supply chain sanity? Do you have some review validation process in place?” Right? And that speaks directly to the issue that you’re talking about, which is how do you handle a lot of small dependencies. The InfluxDB Rust codebase at this point has a bunch of different dependencies, and we depend on Apache, the Arrow Rust implementation which has a ton itself. We don’t actually have a good answer for that right now, right? We definitely have the team review if we’re going to pull a new dependency in. And now, at this stage, we are hesitant to pull new dependencies in. But we already have — if you walk the dependency tree of InfluxDB 3, it’s very, very large at this point, so.
Mrinal Wadwha: 01:09:43.164 Yeah. Same thing with us. To that question about supply chain sanity, there’s a bunch of tooling that’s emerging. Obviously, the Cosign/sigstore ecosystem’s got some Rust-specific tooling. We’re, in our team, trying to comply with Salsa, from a built-pipeline standpoint, which is a spec from Google around having some controls in place. But we can do things inside our pipeline. The dependency tree, especially if it’s hundreds of libraries, it’s very hard to — do we really review those hundreds of libraries? Technically, we review when a library gets pulled in, but do we go read its code? No, right? And so there’s so much risk that comes from those large dependency trees, and I think the whole software ecosystem’s still not figured out how to deal with it.
Anais Dotis-Georgiou: 01:10:48.766 Excellent. Thank you. We also have one last question for you, Paul, specifically. Did you use any actors while rewriting 3.0?
Paul Dix: 01:10:58.994 No. No. So the 3.0 codebase, like I said, uses Tokio as the scheduler, and then there’s a lot of locks around there. [laughter] A lot of mutex’s, a lot of ReadWriteLocks sitting around. We use a crate called Parking Lot which is quite good for that. So yeah, we are not using an actors-based approach inside the system.
Anais Dotis-Georgiou: 01:11:32.467 Excellent. Well, thank you so much. I think that’s all the questions that we have so far. So Caitlin, do you want to close us off?
Caitlin Croft: 01:11:39.361 Absolutely. Thank you. Thank you, Paul and Mrinal, and Anais, thank you so much. I know there were a lot of people asking about the recording. So yes, this has been recorded and will be available tomorrow morning. So just check back on the registration page, the same exact URL that you registered, and the recording will be made available. And I also think we might trim this content into some blogs and whatnot since I think it was a really interesting topic. If you guys have any other questions, please feel free to email me. All of you should have the email. I’m happy to put you in contact with Paul and Mrinal, and they can answer it there, as well as once again, the Slack community. I know they are both in there, so I’m sure they’d be happy to hear from you. Thank you, everyone, for joining. I know we went a little over, so really appreciate you guys sticking with us, and I hope you have a good day.
Paul Dix: 01:12:37.035 Okay. Thanks, everyone.
Mrinal Wadwha: 01:12:38.408 Thank you, everyone.
Caitlin Croft: 01:12:40.691 Bye.
[/et_pb_toggle]
Mrinal Wadhwa
CTO and Founder, Ockam
Mrinal Wadhwa is CTO and Founder at Ockam. He is passionate about Distributed Systems, Applied Cryptography and the Internet of Things. At Ockam, Mrinal and his team are building open source tools to help developers build secure-by-design applications that can Trust all Data-in-Motion.
Paul Dix
Founder and Chief Technology Officer, InfluxData
Paul is the creator of InfluxDB. He has helped build software for startups, large companies, and organizations including Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley’s Data & Analytics book and video series. In 2010 Paul wrote the book Service-Oriented Design with Ruby and Rails for Addison-Wesley. In 2009 he started the NYC Machine Learning Meetup, which now has more than 13,000 members. Paul holds a degree in computer science from Columbia University.