Season 9 Episode 6 Jun 15, 2022

The Evolution of SQL and How it Managed to Last Through Time


SQL is the language that we use to drive mission critical applications.


In this episode, we talk with Jim Walker, principal product evangelist at Cockroach Labs about the evolution of SQL. Learn more about the origins and importance of SQL and how it's managed to not only last but also evolve throughout time.


Ben Halpern

Forem - Co-founder

Ben Halpern is co-founder and webmaster of DEV/Forem.

Jeremy Friesen

Forem - Lead Software Engineer, Content Experience

Jeremy Friesen is an open source software developer focused on mentoring, process improvement, and crafting analogies.


Jim Walker

Cockroach Labs - Principal Product Evangelist

Jim is a recovering developer turned evangelist who digs useful, cool, cutting-edge tech. He loves to translate and distill complex concepts into compelling, more simple explanations that broader communities can consume. He is an advocate of the developer and an active participant in several open source communities

Show Notes

Audio file size





[00:00:00] JW: You know, if you look at our lives, I mean, since we woke up today, how many times do you think some keystroke or mouse click hit Oracle, Db2, or SQL Server just in your life so far today? And our lives are run on SQL databases.


[00:00:25] BH: Welcome to DevDiscuss, the show where we cover the burning topics that impact all of our lives as developers. I’m Ben Halpern, a co-founder of Forem.


[00:00:32] JF: And I’m Jeremy Friesen, Lead Software Engineer at Forem. And today, we are talking about the evolution of SQL with Jim Walker, Principal Product Evangelist at Cockroach Labs. Thanks so much for being here.


[00:00:43] JW: Yeah. I’m happy to be here guys. Thanks for having me on.


[00:00:45] BH: So whether you pronounce it SQL or SQL or anything in between, we are going deep on the subject today and talking evolution of this foundational technology. But before we get into that, Jim, can you tell us a bit about yourself and your background?


[00:01:02] JW: So guys, I was originally a developer. I was coding since I was like 11 years old, way back, like Commodore 64. I’ll age myself a little bit. My undergrad is in computer engineering. I coded professionally for seven or eight years. The first language I ever coded in was Smalltalk, which is if you don’t know what it is, it’s one of the most beautiful languages ever created. The IDE that was for it was just amazing and then that kind of language died and went away. But I ended up coding like Java and C++, but I was a horrible coder. I was a hack. Dude, I hated unit testing. I hated comments, like a lot of people do. I hated any sort of the stuff around coding. But I was always the person who was able to explain what was going on. And so I kind of naturally gravitated towards becoming a product marketer, actually, taking kind of really complex concepts and translating them into English so that people can consume them and start to understand why we do these things. I’ve been a product marketer for really quite some time, but I’ve really focused on emerging tech. This is my 10th startup in a row. All of them have been successful except for one, unfortunately. So I’m almost like perfect, but that’s okay. I’ve spent a large portion of my kind of past like couple, I guess, decade or so in data. I was at a company called Talend for a while. I was at Hortonworks, helping kind of drive the whole Hadoop thing, worked at CoreOS for a while. So I kind of got introduced to distributed systems a while back. I’ve been in that Kubernetes community for a while. And when I saw Cockroach Labs, it was kind of a foregone conclusion for me. It was like the combination of all the things I love, data databases and distributed systems, and that’s how I kind of landed here. So that’s kind of the quick history of Jim, I guess, and my professional career.


[00:02:36] JF: So Cockroach Labs, what do they do? And a little more detail about what’s your role there. I think you alluded to it.


[00:02:43] JW: So what we do is we outfit cockroaches with radar equipment and send them into like enemy zones. No, we don’t do that. We are makers of CockroachDB. It’s a database that is actually cloud native from the ground up. This isn’t like, “Let’s take a database and change it a little bit so it can be cloud native. Let’s take a layer and make it distributed.” This is kind of, “Let’s reimagine, let’s completely rebuild and re-architect the database so that it’s cloud native and built for distributed systems.” And that’s what we’re doing here. We’ve been at it for about eight years. I think the repo was first created about eight years ago. It’s open source. You can go check it out. And I have had some pretty great successful companies adopt us and the momentum is building here. And so it’s a relational database, but it’s a relational database built for cloud and built for kind of distributed systems. And so my role here, I am principal product evangelist, I guess, is my new title. I was VP of product marketing for a while, but my job now is to really go out and preach. I don’t sell products anymore. I preach a belief system. So that’s what my job is now, guys.


[00:03:42] BH: So you talk about CockroachDB being cloud native from the ground up, distributed architecture, built brand new in a way, but it’s also a SQL database. So there’s a component of this that was explicitly building on established ideas and conventions and stuff. And that’s really the topic of the episode. So when it comes to CockroachDB being built from the ground up, but also only in a way, can you talk about the SQL component of the whole thing?


[00:04:12] JW: So if you look at any database, there’s really kind of three layers to every database. There is storage because, ultimately, I mean a database does nothing but right things to disk, like literally that’s what it does. Right? And then there’s an execution layer and execution is also where the magic really happens. I mean, storage also, there’s a lot of really cool stuff going on down there. From a distributed system’s point of view, the way we can get naturally resilient, the way we can scale, the way we replicate data, there’s a whole lot of really cool stuff. Execution is also really, really difficult in a distributed environment. So how do you kind of build those things from the ground up to be cloud native and distributed? A whole lot of really, really awesome software engineering has gone into that. But if you look at the language side of it, how are people going to communicate with this? Very early on, Peter Mattis, one of the founders at Cockroach Labs, decided that we were going to be wire compatible with Postgres and implement as much of the SQL syntax as possible. That which made sense in a distributed system. Because ultimately, I mean, look guys, SQL is the language that we use for these mission-critical applications. People still learn SQL. There’s not too many developers that I know at least that don’t know what SQL is. I mean, it is the language that basically drives kind of mission-critical applications. If you look at our lives, I mean, since we woke up today, how many times do you think some keystroke or mouse click hit Oracle, Db2, or SQL Server just in your life so far today? And our lives are run on SQL databases. When you start to think about those legacy databases, what is that going to mean in the next three to five to ten years? Rearchitecting underneath that is really important yet making it acceptable and adaptable and kind of familiar with the way that we code today I think was a really, really critical decision, because I think that’s how you get adoption and it’s not just about adoption. That’s kind of how businesses are run is a lot on SQL. And that was part of the choice.


[00:06:00] JF: What do you think contributes to the lasting power of SQL in the developer ecosystem?


[00:06:05] JW: I think it’s relational algebra, guys. SQL is math. The whole debate around SQL goes back into the mid-eighties, and sorry, but Oracle 1, there was something called QUEL, if you go back in time, there was Q-U-E-L. And so there was this debate going on back in that day and the Oracle team kind of won out. And the entire context and concepts around these things was calculus or algebra and basically SQL 1. And it works because you know what? Normalization is a thing when you really start thinking about databases because that actually affects performance. Cardinality of queries is actually really important. That affects performance of queries, right? The relational database is important. Referential integrity is important, joins, relationships. All those things are powerful and developers have relied on the database to deliver that power around data to fuel their applications. So they don’t have to code that, like what belongs in the database and what belongs in the application? That’s a great question. And so fast forward to NoSQL days where they kind of stripped away some of the constructs of the relational database and made this kind of the base database is what they call that, you were now left to do like aggregate using code. And so where should that be done? Well, there’s tradeoffs, and I think SQL is extremely powerful. The relational database is incredibly good. Look at the adoption of Postgres. It’s everywhere. Man, it's a super powerful tool. That’s why it's a kind of lasting technology, in my opinion.


[00:07:27] BH: So SQL is lasting, but it’s also evolving. What do you think has been that evolution? We’re building on a solid foundation, but what are the most important things that have changed?


[00:07:38] JW: The language of SQL hasn’t really changed a whole lot. I mean, there’s new features within Postgres every year. There’s new kind of little things happen, but there hasn’t been huge massive changes to the syntax in really quite some time. I think as developers though, we don’t want to think about the database. So I think this movement towards ORMs and kind of fixing that mismatch between the way we think and code was this object to kind of this SQL world, which is tables. I think that’s still a challenge for a lot of developers, honestly. And I think some want to use ORMs. I know a lot of developers do, but software engineers don’t want to. I think they get worried about the performance. I love the development of things like GraphQL. Some of the things like Prisma is doing, that sort of stuff, to make this a little bit more simple and a little bit easier for developers to adopt SQL, but I don’t see SQL going away. I just think we’ll build tools around it to kind of simplify these things because there’s still a whole lot of power in it.


[00:08:34] JF: Yeah. You’re speaking to a developer here who has long used Rails, but I started using Rails after I had extensively used Db2 and also access database in SQLs for querying. And Rails was great until it got in the way. And then you inevitably are like, “I’m going to write some SQL.” And so having the best of breed available has really made a tremendous impact on what we can deliver. So I can definitely see this foundational layer as so critical. And I guess I’m kind of wondering if there are any practical competitors using SQL.


[00:09:16] JW: Not really. I mean, SQL’s everywhere. If you look at really the power of SQL, do you really want to code a join outside of the database, you guys? Do you really want to go pull all the data and actually run your own join and actually code that algorithm? I don’t want to do that craziness. No way. Let the database do that. And the funny thing is about databases and especially relational databases, they aren’t easy to build. They simply aren’t. We were talking earlier, like, “What do you want to just code on top of flat files? Hey, good luck.” You know what I mean? What can go wrong with data will and the difficulty of building databases is all the crazy corner cases that you have no idea are going to happen to you. And that’s why there’s only X amount of databases. If it was easy to build databases, we would just do that underneath each of our apps. Yeah, we would have flat files or it would be like some simple thing, but, A, not simple, and then, B, I think the power of actually doing it at a layer and giving that value back to the developer to save them time I think is a big deal. Do you want to code for referential integrity, you guys? I don’t want to do that. I don’t want to do that. It sounds like a nightmare. Right? And so why am I doing that when I just want to build my business application or my application logic?


[00:10:22] JF: Could you just quickly share what you mean by referential integrity to a lot of folks?


[00:10:27] JW: Oh my God. I got to explain referential integrity. No. I mean, referential integrity, look, I mean, in the relational data model, it’s basically maintaining relationships of primary keys across tables. It’s like I have an order table and I have order lines and I can’t have order lines that aren’t referencing orders because each order is consistent of five or six different lines, right? And to have something just orphaned, your data’s bad, right? Letting the database ensure that you can’t create an order line associated to no order is actually pretty important when you’re actually thinking about your inventory and what money you made and these operational, transactional workloads, that’s a really key concept. It’s basically a relationship between one table and another.


[00:11:12] BH: You sort of often mentioned like the days of NoSQL. Are you sort of referring to like effectively the hype cycle about 10 years ago where a lot of NoSQL was in the news and there was a lot of interest and pressure, like is this going to challenge the idea of SQL, is the totally mainstream thing? Is that kind of what you’re talking about of the days of NoSQL? And kind of speak to what came and went there.


[00:11:40] JW: Look, we’re still living in the days of NoSQL, Ben, honestly. I think NoSQL is an amazing tool, like a document database, something like Mongo is really, really powerful, but it just really depends on what workload you’re going to use that for. Right? I think that’s the trick. And too often developers just choose a database because that’s what they’re familiar with. And I find that to be a flawed choice. Right? Because the database can actually deliver power to you and you’re going to use a different type of database for a different problem. Would you use a graph database to monitor transactions of our nuclear codes? No. You’re going to use a graph database to understand the relationships between things and what’s the closest path between X, Y, and Z. That’s why we created those things. Why do we have time series databases? Well, it makes sense because we start to analyze log files or that side of it, it makes sense. Right? Why do you use NoSQL? NoSQL is really good for things like, if I already understand my access pattern, I know what I want. It’s not going to change a whole lot. I know I’m going to have to deliver something to the front end and this is what the data’s going to look like. That’s really great. You know what I mean? But if I actually have to do some sort of thinking or allow the developer, allow the user to basically mix and match their data, SQL in relations get really, really powerful. And so I don’t think it’s like days of anything more than anything. I think it’s a really great time to be alive because we have different databases for lots of different things. And our applications are nothing but a collection of different workloads all coming together to deliver some sort of value. And we get asked a lot, “Oh, how are you guys different than Redis or one of these cache databases?” Well, use that for what it’s good for, but use us for the system of record. You know what I mean? And so I think you use a combination of things to accomplish what you want to do. That’s what I see some of the most successful applications doing today.


[00:13:21] BH: If it’s a flawed choice to have a hammer and treat everything as a nail or not explore the right tool for the job in a domain, is there another angle there being a flawed choice of being overly exploratory or shiny new thing or anything like that of like going with something that is maybe too tight a fit to what you think you need when you might want to go with something maybe more generic or something like that? Where does that side of the coin fit in?


[00:13:49] JW: That’s a personal choice and I’m with you. I think a lot of people chase the shiny new things sometimes. They think it’s like the coolest things since sliced bread that they don’t look underneath the covers. I think those who make good choices are those who understand architectures of what these things do. And if you’re curious about what tool is going to fit the job, you’re going to understand the physics of the screwdriver versus the physics of the hammer. One is downwards, but one is angular momentum and the other is just raw force. And literally, depending on what your job is, you should understand how the thing works. And I think it’s on all of us to do a little bit of that. And so if it’s a new tool, great. I love looking into the architecture or something to understand how it works so I could see where it’s going to fit, and I think that’s the line that the people who can really start to look into these things and not just believe the marketing hype. I’m the marketer telling you, don’t believe the hype. But still, there is substance under the hype and it’s typically in the docs. So go get in those, Git repos and check out docs and read architecture sites. Guys, here’s one thing I don’t trust. I don’t trust tools that don’t expose their architecture, like in their docs, like show me how it works. I trust you. It’s cool, but I want to see how it works. I want to know what your thought process is. In this kind of era of open source and this sort of stuff, I think why not? Why not show it? And I like tools that show you how things work.




[00:15:23] JF: So you’ve been talking about the different types of databases and then also the history of SQL, but this distributed SQL is kind of this different pathway. And I want to hear your thoughts on why this, what differentiates it from the past?


[00:15:43] JW: This concept of distributed SQL is something that I’ve been playing in my head for a couple years. When I first got introduced to Spencer, the CEO here at Cockroach Labs, I knew I was going to work here. We had this conversation about the architecture, what was going on in the database. It was just like six years ago. I’ve been working here for three and a half years. I just knew because the database has to change to take advantage of kind of modern infrastructure, flat out, and it’s not like let’s just change a layer of the database or let's move something and make scale a little bit easier, let’s add auto charting to it. That’s not a solution. Because if you look at kind of modern cloud infrastructure, and we really start to move towards kind of cloud native systems, it’s a different paradigm. And the different paradigm really is rooted in one thing. And I think the easiest way I start to think about these things is in the old world, we thought about data, we thought about the logical model of our data. That’s great, right? We think about relationships and what the tables are or what the structure of my documents are going to be. That’s all really, really important. In this modern world, we have to start thinking about the physical model of our systems as well. And when we start thinking about the physical world on top of that logic, that’s where things get distributed. Now the masters of this who really figured this out was Google. If you want to see technology that’s going to be interesting over the next five to ten years, anything that’s interesting right now, look at those things that were kind of like a shortcut would be, look at all the little startups that started out, like open source projects coming out of Google because they figured this out. I mean, Kubernetes is a descendant of Borg. The document stores are all descendant of big table. Cockroach is inspired by Spanner, which is their relational database, that they wanted for global purposes. And so it is basically building on those core principles. There’s two engineers that I think every engineer should know about because I think they have fundamentally changed our lives as developers and as technologists. There’s Jeff Dean and Sanjay Ghemawat. Both of them are distinguished engineers at Google. And if you don’t know who they are, there’s a wired article about these guys from about 10 years ago that I think everybody should read. Because it’s truly tremendous. You can look at their names in Google Scholar and the technologies that they have designed and architected, it’s awesome, you guys. MapReduce, TensorFlow, Spanner. We’re talking about huge, massive markets that have been created over the last 10 or 15 years. And their names are on it. And Eric Brewer is another one. And so when I think about distributed systems, this stuff is coming out as well. And if you look like Google did, they didn’t say, “Hey, let’s take big table and put transactions in it.” They didn’t say, “Hey, let’s take Postgres and modify it a little bit to do that.” No. They rearchitect and rebuild from the ground up with the principles of distributed systems, baked into whatever it is they’re building. And to me, that’s awesome stuff. If you want a PhD in distributed systems, go check out some of the stuff that they’re doing. So I don’t know if I really got into kind of differences in distributed SQL. But I don’t know. I’m an evangelist. So I preach. Right?


[00:18:31] JF: Right. Introducing the physical layer of this because SQL happens, like you said, is in algebra, which is very logic, very abstract, you represented on a chalkboard, but when you get into like, “And now you have to actually put the bits somewhere.”


[00:18:47] JW: That’s right.


[00:18:48] JF: THAT’S the physical world.


[00:18:49] JW: Right.


[00:18:49] JF: So I think that really did answer it.


[00:18:51] BH: Yeah. Like in the history of computers, maybe there is a time where the physical world matters a hundred times more than it does now because there are fewer abstractions and less built on top of it. But then maybe there’s a period where it might matter more now because of the cloud and distributed systems as kind of like the general span of how these things are kind of going.


[00:19:12] JW: Yeah, absolutely. Look guys, the speed of light is no joke. Okay? We can’t beat it. Period. So where data lives is actually pretty damn important because latencies are really difficult to get past. When it takes 400 or 500 milliseconds to go from New York to Sydney or whatever that exactly is and you have to go back and forth in a couple hops, what does that mean for somebody’s waiting on your application? Is that a couple seconds and do people have patience to actually deal with that sort of thing? There was another engineer at Google. They came up with this concept called the 100 Millisecond Rule. And back when they were building Gmail and the 100 Millisecond Rule basically states like anything that takes longer than a hundred milliseconds, humans will discern some sort of delay. Anything under a hundred milliseconds, we can’t actually even see that as a delay. Right? Somebody told me that fighter pilot and jets do that. They use that same sort of rule because it’s such, I don’t know, I just saw Top Gun 2. Anyway, but it’s like super, super fast. Right? And so the speed of light’s no joke. So what does it mean? And talk about distributed transactions. So what does it mean when I have somebody trying to write a record in New York and somebody trying to write a record in Singapore at the same time and I’m in some financial ledger? Who wins, guys? And those are really, really difficult challenges to solve. And that’s why taking legacy tech, which was built for kind of like stovepipe, single instance world, and just basically trying to attach some sort of distributed system to that, it takes a rethink. You need to reimagine the problem. You need to start taking some of these concepts and rethinking what that problem is. What is that challenge? Because when we first dealt with the challenge of a transaction 40 years ago, we weren’t thinking about doing it in two parts of the planet at the same time, guys. You know what I mean? It’s a different challenge.


[00:20:54] BH: So a big part of the fundamental challenge for the database is achieving consistency in a very challenging environment. But then you also talked about latency, which is if you just need consistency, you don’t necessarily care about the latency in the same way, but is that latency and that opportunity to deliver a faster experience, those are sort of two challenges one has to solve for, is that correct? And are there other sorts of concepts? So like consistency, the data eventually has to be correct and hopefully quickly, like within the needs of the business. And then you brought up latency though. I wonder how much that plays a role in the actual decision making at the database layer and what else is there?


[00:21:39] JW: That’s a big deal. The battle of latency versus consistency is a big problem, right? That is the battle, Ben. You can’t fight the speed of light. I would love to be here when we do by the way, but I don’t think it’s going to happen in my lifetime. I would love to see how life changes. But if you think about consistency too, I don’t know, you two are developers. Do you guys know what isolation levels are in a database?


[00:21:57] JF: I am not familiar with that.


[00:21:58] JW: So literally, our databases all have this concept. It’s like the I in ACID, right? It actually stands for isolation and isolation levels are in different databases. You can set basically the parameters of how consistent your transactions are and how the database functions. Basically it was there. It’s been in place to basically to tune performance of a database, but with different isolation levels, what happens is different problems can go wrong with your data. You will have an eventually consistent database, which just means like transactions can come in at some time and not get the right data or you can write data and that’s never there. And so what can go wrong with data is really weird. Basically at Cockroach, what we’ve done is we said, “Look, let’s just take the decision of isolation level out of the hand of the developer. Let’s just make it serializable isolation, which is basically saying all transactions are going to be guaranteed correct. Period. Flat out. Let’s not even allow that to be some sort of tuning mechanism.” Okay. So if we’re going to do that, then we’re going to have to fight the side of the latency, Ben. Right? So what we do is in Cockroach, in particular, we don’t synchronize data across places. We use something called distributed consensus. And again, this is another area. If people aren’t familiar with Raft or Paxos, which are kind of these distributed consensus algorithms, go check them out. And if you want a really quick shot at Raft, there’s this website called, which has this really wonderful graphical representation. I don’t know who did it. I want to meet the person because I’ve sent probably about eleventy thousand people to this site. It's crazy, but it’s super cool. And what we’re doing is when we write to the database, we’re actually writing in triplicate. You can write five times, seven times, some odd number, and we’re going to get a quorum, right? And we write three times, but where we write that data is important because, Ben, like you’re saying, what’s the tradeoff of latency and resilience? That’s what you have to think about. This physical nature of data is making you think about the speed at which you’re going to access it and what you can survive. And in the cloud, that’s really important. So if I could have all three copies of my data live in Germany because my user is in Germany, great, cool. They’re going to get really fast access to that data. But maybe I have one copy in Germany, one copy in the UK and one copy on the East Coast of the United States, so I can survive the failure of an entire region going down. I still have two copies of that data. Right? I have a quorum. I have two of three, right? And so this is the kind of stuff in distributed systems where you’re playing with how data actually works within these systems to fight those things. So yes, consistency is important. Latency is important. Resilience gets really, really important. And then how do you scale these systems? I think those are the kind of four vectors that we think about all the time here at Cockroach.


[00:24:27] BH: Which of these challenges bubbles up to the SQL itself as being part of the evolution of SQL as far as I might care about as a developer, beyond just the implementation? Which parts of the challenges make their way up to kind of my interaction with the SQL?


[00:24:43] JW: So I mean, for the kind of casual everyday developer, this shouldn’t make a whole lot of changes to them. However, when you start to deal with a distributed database, you start to think a little bit differently. You really do. The way that primary keys are set up in your table actually have a very big impact on how data gets distributed in the back end. That’s a wholly level of architecture conversation I don’t want to get too deep into, but actually primary keys and not having sequential IDs is a big deal. You end up something called a hot range. So that’s another story. But this physical location of data actually is something that’s important to people. If I have a distributed database and I have 10 endpoints for that one single logical database, am I going to like batch write a hundred thousand records against one node? No, use all 10 endpoints. And so batch it out into 10,000 each and then use the 10 endpoints. It’s a paradigm shift. It’s a paradigm to becoming distributed for the developer. And when they start to understand the power of that database or why something didn’t work a certain way, they start to learn some things. The other place that this does actually come into play a little bit, and we’ve made this really simple here, like with a simple alter table command, I can change the face of a table and say, “Hey, where does this data live in that system?” We’ve made it really dead easy for people to start to understand where data lives. And so for us, we had to add DML to what is kind of normal SQL syntax and that alter table command so that we can make it really simple for people to be like, “I want to be able to survive the failure of any region for this table,” or, “I want fast access to that data for this particular table.” We do that now in DML.


[00:26:14] JF: Can you expand on what you mean by DML?


[00:26:17] JW: Yeah. So within a database, there’s kind of two things. There’s DDL, which is the Definition Language, and there’s DML, which is Data Manipulation Language. So just DMLs is basically like understanding, kind of like manipulating where the data’s going to live for each table. And so this is kind of one of those core concepts in SQL.


[00:26:34] BH: Cockroach is not such an old thing, but it’s been around long enough to the point where from my perspective, every single cloud platform these days seems to have edge compute as an offering these days. They’re fighting for the cleverest implementation of the notion of like, “Do your computing on the edge in some capacity.” What’s it been like building a layer relating to that kind of challenge, but also seeing so much change at the computation or where physical infrastructure seems to be so much more of every new thing? So if you’re building something that’s 10 years old, maybe it doesn’t matter to you. But if you’re starting from scratch, you’re probably going to consider some of these choices. What’s it been kind of co-evolving with a lot of fast moving ideas?


[00:27:28] JW: I would say the past six years in particular, the amount of change that we’ve all seen, it’s the most exciting time of my entire career. Absolutely. Think about this. Our entire software delivery supply chain has completely changed. Everything is better. I think it’s more complex at times, and we’re doing things in YAML that we could have 10 lines of YAML code that was a process like two days for somebody to do, you guys, like the abstractions we have is awesome. To be a part of this, both here and my previous work at CoreOS and that kind of stuff, this is nothing but fun, but I think we’re only at the beginning of it. We really are. To me, the change of everything is like the level of automation that we’re applying to things right now is just truly tremendous. I’m proud and honored to be at a database company that’s about seven or eight years old because, again, databases are not easy to build. There’s a guy, another kind of Turing Award Winner, super luminary in databases, a guy by the name of Michael Stonebraker. Stonebraker created a company called Ingres. He created Postgres. That guy is brilliant. He said at some point that it takes seven to eight years for a database to fully gestate before it’s actually something that’s going to be reliable for kind of enterprise type workloads. And he’s right. These things take time. A shiny new object, like we were talking about Ben, might work for certain things. You know what I mean? But for databases, like this stuff, it’s really, really difficult to do. And so for us, the timing and everything is kind of coalescing together at the same time. But think about seven, eight years ago, you guys. How many people were really using the cloud, like really? Really, seven, eight years? 2015, 2014, it was still kind of like, “Oh yeah, we’re going to get there.” Man, there were still people with data center rooms in their buildings. Those things are gone now, completely. And so that evolution to really kind of cloud and I think this oncoming truly kind of different paradigm shift to cloud native, I think we’re just at the beginning of this game. I really do. And I think the next five years is going to be even more interesting than the last five. But for us, it’s fun to be a part of it because we’re living in this world of… the CEO of CoreOS used to call it GIFEE, which is Google Infrastructure For Everyone Else, and I think we’re all starting to live in that world where it’s kind of all becoming cloud native, I think.


[00:29:39] JF: Let’s call back to like the faster than light speed that you were talking about. You’d love to be there for that. And you’re talking about us being at the start of something spectacular. And here you are, an evangelist, a preacher of technology, and I kind of would love to hear some of this prognostication that gets you to this state of faster than light, which we can’t get to. But I want to know, where do you see this going even further?


[00:30:08] JW: Yeah. So here’s another thing I wasn’t really a believer in about 8 months ago, 10 months ago. I really wasn’t a believer in serverless. I didn’t get it really, I guess. I just didn’t really understand what this stuff really is ultimately going to be. But I think we’re starting to get to the point where we can start to abstract away infrastructure from developers to the point where they don’t even need to think about it. Like for us at Cockroach, Spencer and I have these conversations about what we think the future of the database needs to be. And to me, the future of the database, what if the database was simple as SQL API in the cloud? Connection string in the sky, dude, super simple, like seven or eight endpoints around the planet, which we’re guaranteeing sub 400 or 500 millisecond access to data or maybe 50 millisecond access to data, and I would just code against endpoints that are across the world. I didn’t have to worry about upgrading that database or scaling it or building active/passive resilience literally. What if we could just do that? When you eliminate the concept of infrastructure completely for the developer and have them code against things like the database, just like they code against Twilio, for whatever API they need there, what it’s like, we code against Okta, whatever, like literally. And I think we’re at the point where we could start to do these things and we’re starting to see it happen. We do work with the Google Cloud Run team. I love Google Cloud Run. I think it’s really super cool, like containerize an app, throw it on Cloud Run, it scales up and down based on workload. That's cool. I don’t want to deal with if that stuff is involved. Who wants to deal with that? Who in the ops team wants to deal with that stuff? Nobody wants to. We can automate scale. We can automate out resilience into systems too. We can automate out downtime. And so I think we’re getting to the point where we’re going to live in that world sooner than later, I think. I don’t like the kind of term serverless. I like the term infrastructure list. And I think there’s going to be a group of people have to do that because the other side of this by the way is there’s not a whole lot of people who get this infrastructure and it’s hard to find them and it’s hard to hire them. So I think we’re going to have to go that way eventually.


[00:32:09] JF: That fed right into my next question of total reliance on the magic, the infrastructure list, which hearkens back to the statement about, I forget if we made this on mic of we don’t teach SQL in college anymore. I’ve seen a lot of Rails developers and SQL is paralytic. What are the consequences you see of us moving towards that it’s not technically a single point of failure, but it is a conceptual, single point of failure?


[00:32:44] JW: Mine is a little bit more, I guess, philosophical. I worry about the laziness of developers. Or if we make it that simple, are they not investigating the architecture of what’s underneath anymore? And this comes back to what we were talking about before. If you’re going to use the right tool for the right job, you got to understand how it works. And so if we abstract out the complexity completely out things, are we going to abstract to people away enough that these things are just way too simple? I do worry about things like that, you guys, because I think that’s one of those things. I mean, I grew up in an era where you had to actually understand how the things internally work. I learned Assembler. I don’t know if they even teach Assembler in college anymore, you guys. It’s really powerful that I know what that is because it actually allows me to understand certain things in tech that, I don’t know, I’m not sure everybody really gets. And so these abstractions that we continue to do, I think we’ll have a further divide between developers and software engineers. And I really do think there’s two different professions. There’s developers and there’s software engineers out there.


[00:33:43] JF: Well, you also highlighted like it’s hard to hire the infrastructure people. Now make it infrastructure list and now try to hire, grow, develop those folks, you’re talking PhDs plus of being on top of it and kind of that hyper dependence on a built skill set. And what does that mean in a free and open source movement?


[00:34:13] JW: I don’t know. That’s a tough one. You know what I mean? The beauty of open source is it basically allows the collective to come up with something. Think about the advances we’ve made just in software, just because of open source. There’s no way we would get to where we’re at right now. The concept of Git in this whole movement towards the way that we think, Yeah, I just see is like a subversion, what a nightmare. Just think how simple things are. And it’s basically the collective energy of all of us coming together to make that happen, there’s something there too. I think that’s probably the other side of exactly what we’re talking about. Right?




[00:35:05] BH: A lot of artful software development presents a clean interface and an abstraction, but also makes it diving a little bit deeper possible for anyone who needs to. And when we’re talking about ORM to SQL, that’s the perfect example. Like if you’re going to build an ORM, it’s likely you want to make this SQL opportunity accessible for anyone who might possibly want to use it. And if that’s the case, the code only gets as ugly as some extra SQL and isn’t a further pain. What are some of those kinds of concepts as we get into some of this modern distributed architecture where we don’t want to expose all the complexity, but you don’t want to make it impossible to change something when the complexity needs to be dealt with? Talk about just the art of dealing with some of those problems.


[00:36:03] JW: It’s a great question, Ben. Look, genericize the mundane, but give people the power to do the complex. That’s exactly what we’re talking about here. Like, “How do I genericize and automate out the mundane concepts and the mundane tests that I have?” It’s like in Kubernetes, like, “Okay, how easy is it to really scale up the number of pods that I have?” That’s pretty simple and straightforward. However, for a lot of different pieces of software within Kubernetes running a rolling upgrade, which is kind of a little bit more complex, they need to build an operator to do that. Right? And so that’s a good example of kind of one. That was the one that just first came to my mind. Why did we have operators in Kubernetes? Well, it was basically to get more complex tasks completed and to automate those things. And so we’re exposing the API, right? And so people could do these things. But giving them the construct to actually run these things within that API, I think that was the power of operators in that ecosystem, for sure. But the mundane tests are relatively straightforward to basically take care of, right? Relatively is all kind of interesting word in the concept of this current topic, but I think that’s the power. How do you abstract out those things that are going to be mundane versus those things that I do want access to it? We deal with this all the time. I was talking about, “Okay, what would a SQL API in the cloud look like, guys?” I just have gettInputs and there’s some REST interface and I’m getting data, which is just a select statement and I’m putting data like Postgres, which is this open source project I think is pretty cool. It’s like a REST interface on top of Postgres. It works in front of Cockroach actually. We’re wire compatible. So we use it too. So gettInputs are pretty simple. Can I still have a connection string where I can connect to the database and ask more complex queries and construct these really complex SQL statements? Oh, yeah. You know what I mean? And I think it’s a combination of those two things is a good example of that.


[00:37:49] BH: Can you define wire compatibility? You’ve used that term a couple times.


[00:37:52] JW: There’s how you communicate and then there’s what you can say. SQL syntax is select, star, from, customer. That’s the syntax. That’s like what you can ask. Wire compatibility is basically like all the drivers that work with Postgres are going to work with Cockroach for the most part. It could connect to it. Right? So I can take pgAdmin and put it in front of Cockroach and it’s going to connect. Will all the capabilities in pgAdmin work? I don’t know. We haven’t tested the crazy amount of capability that’s in that thing. Right? It’s pretty awesome. Our constructs in Cockroach is a little bit different than normal Postgres. Yeah, a little bit. And that’s the syntax side. So we’re definitely wire compatible. And from a syntax point of view, we’re pretty deep as well.


[00:38:36] JF: If I want to pivot out Postgres and use CockroachDB, how am I going to go about doing that?


[00:38:41] JW: So the first thing I say to people every single time they ask me about like, “Hey, can I just migrate to Cockroach?” I’ll be like, “Any person that ever tells you a migration is easy is smoking something good, dude. No way. All migrations are difficult.” There’s a couple things you have to rethink of. And this comes back to I think one of our earlier questions, is like, “What does it mean to that developer as they kind of work with a distributed database?” It’s a paradigm shift. It really is guys. And so you have to start thinking differently about things, like how your primary keys are in that table is a little bit different. Thinking about your latency versus resilience goals on each table. Within each table, we can set different goals for how fast you can access that or where you want it to live so you can serve what’s your failure domain for each table. And thinking through those things is one of those things that’s very different. The third part that tricks up people is if you’re living in this world of stored procedures, that’s cool. I get it. I understand why we had those. It’s an anti-pattern of the future. We actually haven’t implemented stored procedures in Cockroach because we’re too focused on all the other cool stuff that we can automate out. Do we have a pattern and a design to do this? Yeah. But we’re not going to do stored procedures. We’ll do something like distributed data functions, like imagine serverless functions, but combined with a database kind of thing, because that’s what we need. I don’t want to just blindly take something from the past and do it because it’s in some app I want to migrate. You know what I mean? If we’re going to move to cloud and we’re going to move to cloud native, let’s be cloud native. Let’s do that. You know what I mean? And so for us, it’s not always a one-to-one. There’s definitely some things you got to think about. To me, it’s the challenge of the paradigm shift. It’s the challenge of overlaying that physical model on top of the logic that we’ve all been thinking for years and years, and I think that’s the trick.


[00:40:26] BH: I’m curious, actually that brought up some curiosity about the governance of CockroachDB. If a massive company that needed stored procedures for some reason came along, is there a price and is that even kind of possible, like Cockroach Labs versus CockroachDB? Certainly no one wants to go backwards, but a lot of compromises happen when you’re talking enterprise software and that sort of thing. Hypothetically, how would those compromises come to be?


[00:40:57] JW: Super awesome question. I love that question more than anything. We have super huge companies who have asked for it. We have them. Our logo list, people who work in Cockroach, it’s just awesome, dude. It’s amazing. Sure, we’ve been asked for it. No, we’re not doing it. We’re going to do it in the right way, what it needs to be. And I got to tell you y’all, I’ve worked for some sociopaths in my life. In my career, I have worked for some pretty insane CEOs to do whatever because they’re chasing whatever that is. I really do believe this company is not chasing some sort of financial rainbow somewhere. We want to build a database that works today, but I want to build the database that’s right for three years from now, five years from now, ten years from now. What does the relational database need to look like in the future? And for us, that’s what we’re building. And that’s why I’m not going to take something from the past and just modify it and put like auto charting around it or take one layer and change it. No. If we’re going to do this, let’s do it right because I do believe that all of this has to change. And so yeah, will we do something like stored procedures? Yeah, but we’re going to do it right. You know what I mean? Like let’s do it right for what it needs to be. And if it means we have to, I guess, what is it, like rob Peter to pay Paul? Is that the kind of thing? We’re not going to go just simply chase something because it’s such a huge opportunity. And you know what? I think our customers appreciate that more than anything. I do as an employee, for sure, because I want to work somewhere cool. I want to build cool stuff. I love talking about it. I’m a believer in it obviously. I’m an evangelist now, guys. I got to go and preach. So if I don’t believe in it, I can’t do it. It’s fun building stuff that we believe in.


[00:42:30] BH: Well, thanks for coming on the show.


[00:42:31] JF: Yeah. Thank you.


[00:42:32] JW: Yeah. Thanks for having me, guys.


[00:42:42] BH: Thank you for listening to DevDiscuss. This show is produced by Gabe Segura. Our senior producer is Levi Sharpe. Editorial oversight by Jess Lee, Peter Frank, and Saron Yitbarek. Our theme song is by Slow Biz. If you have any questions or comments, email [email protected] and make sure to join our DevDiscuss Twitter chats on Tuesdays at 9:00 PM Eastern Time. Or if you want to start your own discussion, write a post on DEV using the tag “discuss”. Please rate and subscribe to this show on Apple Podcasts.