Talking shop with a couple of heroes.
In this episode, we talk about solving problems via Amazon Web Services with Ken Collins, AWS Serverless Hero and staff engineer at Custom Ink, and Vlad Ionescu, AWS Container Hero and DevOps consultant.
Ben Halpern is co-founder and webmaster of DEV/Forem.
Christina Gorton is a Developer Advocate at Forem. She is a LinkedIn Instructor and technical writer.
Ken Collins is an AWS serverless hero and principal engineer at Custom Ink where he focuses on growing their DevOps culture within the Ecommerce teams. Custom Ink is approaching its 20th year in business and is entering its second phase of Cloud adoption, where he helps a growing platform technology team succeed using AWS-first well-architected patterns.
Vlad Ionescu is a consultant focused on getting companies to high-performing levels. He is an AWS Container Hero and focuses extensively on developer velocity.
[00:00:01] KC: For the love of God, this stuff is hard and I don’t want to worry about things. And fun fact, Fargate and Lambda run on the same technology, I believe, that’s called Firecracker.
[00:00:12] VI: Yeah. We’re not lazy. We’re energy efficient.
[00:00:28] BH: Welcome to DevDiscuss, the show where we cover the burning topics that impact all of our lives as developers. I’m Ben Halpern, a Co-Founder of Forem.
[00:00:29] CG: And I’m Christina Gorton, Developer Advocate at Forem. Today, we’re talking about Amazon Web Services with Ken Collins, AWS Serverless Hero and Staff Engineer at Custom Ink, and Vlad Ionescu, AWS Container Hero and DevOps Consultant. Thank you both for joining us.
[00:00:53] KC: Thank you.
[00:00:54] VI: Thank you. It’s an honor to be here.
[00:00:55] BH: All right. So AWS is a big topic. And today, we are going to cover all of it, but that’s a lot to get into. I mean, despite it being, we’re just talking about one specific cloud provider, it’s obviously a huge part of the developer industry as a whole. So absolutely, this type of episode is really valuable to cover all sorts of that stuff. But before we get into that, can you start us off Ken, by getting into your background as a technical worker and developer?
[00:01:26] KC: I’m about I think 13 or maybe 14 years into my programming career. I’m almost 50 years old now. So I started late as a software engineer. Prior to that, I was like a graphic artist and a middle manager at an ad agency and a marketing director. And I’ve always approached technology and programming from a Ruby on Rails background. So that was my first language. And I’d say after about 10 years of getting really good, I’ve done a lot of open source work with the Rails community. I maintain the SQL Server Adapter for a while. Well, I was proud enough to represent the Ruby community along with a couple of other folks from the Python community when Microsoft was first doing well with reaching out to open source. But here, lately, at Custom Ink, I’ve been there for about eight years and over the past few years have sort of dedicated my experience to sort of retooling my career, getting completely unfamiliar with how to do anything right, and learning all about AWS and how we can adapt AWS at Custom Ink to sort of like complete our second journey into the cloud. We went into the cloud about four years ago, but then there’s a desire to use the cloud more often and natively. And I started learning Lambda about a couple of years ago. And then maybe about a year or so ago, AWS recognized the work that I had done with Ruby and Rails, specifically in Lambda, and asked me to become a hero. And I’ve been advocating for Lambda since then and having a really good time at it.
[00:02:49] CG: So you’ve been at Custom Ink you said around eight years, which is a really long time for a lot of engineers to stay in one company. Can you talk at all about what Custom Ink is specifically and a little bit more? You just described some of it, but a little bit more about what your jobs look like over the past couple of years.
[00:03:06] KC: Yeah. That’s a really good question too because I think a lot of times in dev advocacy we talk about things that are sort of close to us and how our company solves things. And a lot of those problems are very contextual, right? What works for Custom Ink may not work for other people. So the company is about 20 years in business, have used Rails since Rails 1. So you can imagine we have a lot of Rails apps. When I first joined, there were just two big ones and now there’s hundreds and a lot of my career there has been helping us adapt and sort of modernize the architecture. And that’s really fun at a company that is moving the architecture forward. Not because of some sort of diagrams that we’ve written that said, “This looks good,” it’s because sort of like our business success has forced those ley lines of the architecture to be more understood where we have to abstract things out and stuff. So I work across different business units and teams to sort of tie together a technical strategy in many cases, and I think that’s a lot of what’s been over the past few years with a little bit of emphasis maybe on SRE and new business venture unlocks and stuff like that.
[00:04:11] BH: And Vlad, how about you? Get into your background.
[00:04:13] VI: I’m a DevOps consultant. I like to joke that I'm basically a professional mistake avoider, like that’s everything that I do. It’s surprisingly accurate. I have a background in finance. And for the last four to five years, I’ve been doing work as a consultant. I’ve been mainly focusing on containers as serverless with an emphasis on developer velocity and observability. I think those are really, really, really important. As we all know, usually in tech, speed of iteration is better than quality of iteration and improving on that and empowering teams to move faster has been really fulfilling for me.
[00:04:54] BH: So both of you specialize in AWS. You’re both AWS heroes and I’m curious, why the depth and specialization into AWS as opposed to maybe like broader specialties? Or what about your careers and your interests have taken you in this direction?
[00:05:18] KC: I think Lambda was the first function as a service that really sort of hit the market. I could be wrong on that, but I think that was around 2014. That was also back around then when microservices were really hot. I also think I was just sort of a classical Rails developer who hasn’t worked with Rails app in an S3 bucket. Right? So it’s kind of easy to say if your whole cloud experience is, “Yeah, I know how to make an S3 bucket. I know how to put things in a bucket and get things out of a bucket.” And that’s an AWS. And then sort of like if you’ve heard about this thing called serverless or AWS Lambda, then for me, it was really easy to sort of anchor to AWS because that’s where our infrastructure where we’re moving to and it just seems like AWS has just been out there longer. So whenever I wanted to learn to program the cloud, that was a natural choice for me.
[00:06:07] BH: As a Rails developer, what about Lambda appealed to you or your organization? What led to the quick adoption of that idea?
[00:06:17] KC: We have a testimonial on AWS’ page for Lambda and that was in 2017 where we took our design lab where people put fonts and clip art and upload together for their T-shirt designs. And we had to optimize a particular Rails app that rendered the clip art, so as users would make change and they would drag to resize it. And that was written in 2017. There wasn’t a Ruby on Lambda then. So a lot of our experience with that was basically how do we take these big Rails apps that we have when it went from one to two and then maybe from two to twelve. And one of those twelve was really hurting for scalability and just pull out a small part of that and hyper optimize it on this sort of new compute paradigm. That meant we had to rewrite it in a different language, but I think that’s mature. That’ll happen as you get bigger and things need to break up and re-platform and stuff that makes a lot of sense.
[00:07:08] CG: And why was AWS your main choice for cloud services, Vlad?
[00:07:11] VI: I just started with AWS in college actually. We had an AWS class, which was super, super nice, and I found it fun. It wasn’t the main cloud I was working with at the time. I was very much having fun with DigitalOcean. I was writing my own email server because I was a masochist, I guess. But when the time came to do my thesis, I didn’t want to particularly optimize my code because I wanted to graduate, and I realized, “Hang on, I can just give AWS five dollars. They’re going to give me a machine with two terabytes of memory. I can run my optimized code. It’s going to finish running in half an hour.” And that’s it, it’s five bucks, I have the privilege of just doing that. I was like, “Oh, there’s some power there. That’s interesting.” And we just snowballed from there. I got into running stuff at scale from my background in finance and I needed to specialize on something. I don’t know all of AWS. For the record, AWS has what? Two hundred something services.
[00:08:15] CG: Yeah. No one does.
[00:08:15] VI: Yeah. I don’t know all of them. No. No, I didn’t know. I know a subset of them and I’m working with them because I’m enjoying it and right now it’s one of the best ways to build applications and to build a company, to actually deliver value to people. That’s not to say that other clouds aren’t awesome. They are, but I’m just doing AWS.
[00:08:37] BH: So we’re going to give Ken a chance to really expand and explain Lambda a little bit for us because we’ve already touched on that a few times so far. But Vlad, can we start with you and allow you to get into some of your favorite AWS services or just ones that stand out to you as worth explaining to the listeners who might only have a cursory understanding of some of the services out there? So maybe S3 is well-known enough as a storage service that a lot of folks know about it. Can you get into some of the services you use and find success with and maybe give a brief overview of a handful?
[00:09:14] VI: Yeah. S3, unlimited storage in the cloud. It’s object storage, it’s not block storage. So that means that if you have a hundred gig file and you want to change ten bites into it, like one character, you got to download and re-upload the whole file. There’s no tiny change in that. But when it first came out, it was one of the first services. It was so shocking. You had unlimited, virtually unlimited because nothing in unlimited, but unlimited storage. That was absolutely awesome. It’s one of the fundamental building blocks of AWS. Anything that uses storage really much uses S3 at one point. And that actually moves me into the next service because I didn’t know what I wanted to talk about initially. But Fargate, let’s talk about Fargate. So Fargate came out three years ago at re:Invent and it was a terrible service. I’m sorry. It was a terrible service. It was not at all a good service. It was slow and very, very expensive. So what actually is Fargate? Fargate is a “serverless” way of running containers. So you basically just throw a container to AWS and tell them, “Hey, run this for me.” They’re going to handle everything. They are going to handle what node it runs it on, what operating system it has, patches for that upgrade scaling, fixing old nodes, fixing nodes that have problems and so on and so forth. And it’s super, super, super easy to get started with it because you don’t have to do that node maintenance. We had so many vulnerabilities, especially in the last year. Let’s just pick one I know. There was a vulnerability in pseudo that anybody could get root on your machine. How many people actually patch all their instances for that vulnerability? Like the hopeful part of me wants to say a lot. Based on the customers I’m working with and what I’m seeing in reality, the number is a very, very, very low amount. You don’t have to worry about that. You don’t have to care about the OS. You don’t have to care about anything. You just care about your container. And I love that. As I was saying, it came out three-ish years ago. Don’t quote me on that. And it was very, very expensive and it was very, very, very slow. It was not a good service at all. But then AWS, as it does, they released a bare-bone service. So something that is very much an MVP, and then they built on top of that. So they relist it. They cut the pricing twice. So they had two price discounts for it. They add the spot option, which is even more discounted, but they can terminate your containers with a two-minute warning. If they’re going to launch a new one, you’re not really stressing about it that much, and they made it super, super, super, super fast. That’s amazing. It’s one of those services that, and AWS is hating me for saying this, it’s deprecating EC2. So right now, if you’re starting in a cloud, don’t use EC2s, but there are still many valid use cases for that when you need that flexibility. For example, right now, we cannot do Windows on Fargate. You cannot use GPUs. You’re limited to a certain number of CPU cores and a certain number of gigabytes of memory. If you need more, you can go to EC2 that’s always unscheduled option you have. But if you don’t need that, you can just not stress about it and get an awesome, awesome service. And yeah, Fargate is one of my favorite services.
[00:12:44] CG: So as a little bit of a follow up to kind of what you’re talking about, and you’ve mentioned some things like with EC2 some sticking points. As I’ve learned like AWS, there have been things that I’ve seen are hard and I’ve written about for people. What are some things that you think are challenges for people when they’re getting into AWS or into the cloud in general?
[00:13:03] VI: Everything gets harder. I’m sorry. I wasn’t sure what’s more positive. Everything is terrible all the time. Period. So both Ken and myself, we got into AWS when AWS was smaller. If you clicked on all the services in the console, you could fit it on one screen. You cannot do that now and you have services hidden inside others services and it’s messy. It’s a more difficult experience getting started. So I very much sympathize with new people that are starting on AWS. Fortunately, there are a few good helpers. First of all, and I know this is going to sound corny, but it’s real community. Community is a really, really big thing. Everybody’s been where you’ve been as newbies and they wanted to learn and now they want to help because, goddamn it, I don’t want anybody else to spend three days debugging this thing that turned out to be like a checkbox or something. That frustration helps people and there are a lot of communities online, a bunch of Slacks, a bunch of Discords, Twitter, Reddit-ish, not really, that people help each other. So definitely build on the community, ask for help, and actually work on something that fulfills a need of yours. So don’t just, “I just want to learn AWS.” AWS is so, so, so wide as we’ve discussed. Pick something that’s annoying for you. For me, it was annoying that I couldn’t easily run a challenger to have it both on my phone and on my laptop wherever I go. I built a challenger server on AWS for myself. I played around with FUSE. I figured out how to do the main names and routing. I figured out how to connect to stuff, how to secure it, and things like that. Find something that’s interesting to you. That can be something like running your own software. It can be something like building a video game. It can be something like getting started with machine learning. It can be something like racing with AI and ML, which is surprisingly fun to watch, even though I know very little about machine learning. So find something that is interesting to you and try to build that with the help of community. I think that’s the best way to get started.
[00:15:31] CG: And Ken, you’ve done a lot with Lambda and you’ve actually created a talk called “The Case for Rails on Lambda”. Since here at Forem, we are a Ruby shop, we’re very interested in diving into this. But first, can you talk about what the Lambda service is and what people typically use it with?
[00:15:49] KC: Right. And I think there’s going to be two big areas there and a lot of those, which sort of shape over time. So when Lambda was first introduced, it was basically what everybody looked at as a function as a service, right? Take a small unit of code, I don’t care how or like the platform it runs on, let me just put my code on the runtime and let the runtime execute my code for me. And you get one method, one function, and everything that happens in there is basically what it’s for. When it came out, microservices were hot. People are talking about breaking things up with a sledgehammer and taking a service and turn it into a whole bunch of them, maybe a few hundred and there are these little functions that talk to each other. Since last year and December of 2020, I believe, Lambda containers is a thing. So now with Lambda, rather than taking your code and putting it on the computer or the runtime, you can actually take your code, take the same Docker image that was used as a runtime, put your code into it and then ship the whole Docker image. That’s a game changer and you can actually now have six virtual CPUs on Lambda, 10 gigabytes of memory and 10 gigabytes of storage on that container image. It can be still thought of as fast, that function as a service, and that’s just fine, and there’s plenty of use cases for that. But what it is evolving into is sort of this modern compute stack that fulfills on this promise of commoditized compute. Vlad mentioned a whole bunch of these things that he didn’t care about and where he was glad to spin up an EC2 instance or he talked about like Fargate, he cannot worry about the node groups or how the things are assigned and things like that. And that’s my passion as well because I’m a smart person, but for the love of God, this stuff is hard and I don’t want to worry about things. And fun fact, Fargate and Lambda run on the same technology, I believe, that’s called Firecracker.
[00:17:40] VI: Yeah. We’re not lazy. We’re energy efficient.
[00:17:45] KC: And I think a lot of where my career lately has moved is trying to get ahead of the curve or to go where the puck is going, as I say, and go, “Well, if this is the commoditized compute platform of the future and if all of these things I don’t have to worry about, how can I start engineering things today that might look a little strange, like putting a framework inside of a function and seeing if that works?” I can tell you it does and it works really well. As things keep sort of like at AWS moving forward, Fargate Lambda, et cetera, these things may sort of like mold into one. But I think at the end of the day, everybody, even if they love Kubernetes or not, we all want the same thing. We want to not care about a lot of stuff.
[00:18:29] BH: So you sort of got into what I was thinking about towards the end with the evolution of Lambda that you’re describing or like the evolution of how people are thinking about how they can use it in combination with some of the features that are coming out, compare and contrast Lambda and Fargate in this sense. I have a more obvious distinction if I’m thinking of Lambda as this place to run functions as a service. But if we’re trying to overpower it more, I think the differences kind of like become almost more vague. What is the biggest way to differentiate between Lambda and Fargate and maybe also like EC2 and all your options? Where does the code run?
[00:19:16] KC: So Vlad and I have talked about this before, and Vlad, please step in.
[00:19:21] VI: We disagree on it.
[00:19:23] KC: Yeah. I think the one big difference that I would approach. Okay, so there’s one that’s big that says, “Okay, the interoperability within the ecosystem of AWS itself.” Lambda has got a huge leg up on that. It can be invoked in many different ways and invented in many different ways. But when you look at it from a compute of like a traditional application compute model, especially with regard to a container, a Lambda function is only going to take one request at a time. And you would think of Lambda as sand. Think of Fargate more as sort of these, not just boulders, but maybe not pebbles or something like that, but it’s not going to be as granular as a Lambda container. A Lambda container is one request, one response. It’s going to live for maybe 20 or 30 minutes. It will have that frozen state in between, but essentially it’s still there. And it’s just one-to-one. You’ve got this one container, you’ve got memory setting, and the CPU setting, and that facilitates one request, HTTP hit or whatever you call it. With Fargate and other container systems, you might have your Rails applications spinning up multiple processes behind a web server and it’s sort of facilitating sort of orchestrated memory and CPU where it’s a little bit sort of bigger. It has more boxes inside of the box.
[00:20:34] VI: And here I go disagreeing with Ken. So I love the idea of Lambda containers. The current implementation is an MVP. And I don’t like it. If you don’t know about the limit that a Lambda can have a package of at most 250 megs, if you’ve never hit that and had to do some very dirty hacks to get around that, you should not be using Lambda containers. There are valid use cases, many. Ken’s is a great example, putting Ruby in there for a better performance. ML use cases where you have to ship a really, really, really big model with your code. That’s a valid use case, but that’s pretty much it. Don’t try to make it into something that is not. So I’m recently actually working on a talk on this and I like to compare everything to pizza. So it depends on how many knobs you want. So think of Lambda, be it Vanilla Lambda, Lambda containers, whatever, as going to the restaurant and getting a pizza. You’re going to a restaurant. You cannot be naked on your couch, as we’ve all learned during the pandemic. You have some limitations at point. You cannot spend three days in a restaurant, in most restaurants, I imagine. You have some limitations. And that’s why a lot of people can live without limitations. They can build their applications with those limitations. So for Lambda, you can only run for 15 minutes. If there is something that’s taking longer than 15 minutes, your process is automatically going to get killed. A lot of people are fine with that limitation. Other people are not fine with that. So they’re moving through something with more options, something like getting pizza delivered home. You can order whatever pizza you want. You can stay in your couch. You don’t have to dress up and look pretty. That’s awesome. You get more options about what you’re doing. So a container, wherever it’s running, doesn’t matter. A container can run for however long you want it to. You can give it whatever power you want to give it. That’s an advantage. And then we have the, I guess, most extreme option, which would be running on EC2 instances. So that’s the most bare-bone option. If you want a pizza and you want to put pineapple on it, a lot of restaurants aren’t going to deliver that to you. So you’re going to have to make that at home. If you have some very specific needs, again, Windows, if you need three terabytes of RAM, if you need a hundred gigabit networking, if you need super, super, super, super fast SSD storage, if you need those specific things, you’re going to go to that. So it's a spectrum of options, depending on what you need. AWS is offering you everything, you have to pick it, and that’s hard.
[00:23:23] KC: The analysis paralysis. Right.
[00:23:25] VI: Yes, which, fortunately, it’s leading us into my second point, which is you should not be concerned about this. AWS has a really, really nice service that’s called Step Functions. It’s not fully mature, but it’s pretty, pretty good. The function is basically on orchestrator. So it allows you to design business logic, to design business functions. And what you can do relatively easy is you can start with a Lambda and then you discover, “Oh, goddammit, I have to run this for more than 15 minutes. I’m going to throw in a Fargate in there.” And then you realize, “Oh, that’s not ideal either. I’m going to have to move to EC2s.” So it obstructs away the computer part for you. And most developers that are working on most apps that, again, don’t have very extreme requirements shouldn’t have to think about compute. Throw it into a Step Functions, go with Lambda first because it’s easy, because it’s fast, because it’s patched, because it’s cheap, and all those advantages. Whenever you need to move away, you can easily do that move inside Step Functions and move your worker from Lambda to something like Fargate, to something like EC2. I think that’s where companies should be looking at and where developers should be looking at now. I don’t really care about it. Just let me do my job, let me fulfill that business requirement that I have to fill as fast and as efficiently as possible whenever I need to change that because we all know that code isn’t going to live forever. We’re going to have to maintain it. We’re going to have to upgrade it. We’re going to have to change it and it’s not going to be the same code in two years. Whenever I have to move that or to change that, I will. Start with what’s easy, Lambda is easy, and move to something that is more complex depending on your need.
[00:25:09] BH: Can we now talk about how exactly Lambda works? So we’re talking about using it for scalability or offloading of the need to do some of what Lambda does ourselves. But what about Lambda makes it so darn easy in its best case? What makes it scale?
[00:25:32] KC: Well, I think the thing that I like about that the most is that it can scale to zero and it can scale as much as you sort of need it to go within reason. I think a lot of times, if you have an application on Lambda, your database is going to be your scalability point. Because it’s going to hyperscale up functions and instances as need be. It’s thought of as function as a service. Technically, when you break it down, Lambda is one method. The way it works with Rails is that with Rails web applications is that Rails and the Ruby community did this wonderful thing many years ago where they adopted Rack as the sort of protocol above everything. So Rails apps are essentially Rack apps. Same would be true for Sinatra or other different Ruby frameworks. They’re all Rack applications. And HTTP is basically a hash. When your event comes in from AWS Lambda, it’s a Ruby Hash. And the way that I do it with a Lambda Jam that integrates with API gateway, application load balancers, et cetera, is it converts one hash to another hash. So when your Rails app is running in Lambda, you’ve essentially glued Rack right to the service. There is no web server in the middle. You don’t have to install Puma. You don’t have to install anything, Apache, Nginx. You’ve literally made Rails natively attached to their infrastructure, whether it’d be API gateway or an application load balancer. And that’s all because it’s one line of code. It’s the most brilliant programming I’ve ever come up with. At the end of the day, it takes one hash and turns it into another. When you look at it from that perspective and then you’re like, “Okay, when I need to scale up, AWS will make that decision.” Because Lambda sends it in invoke model. It’s not going to look at your CPU usage or other stuff like that. It just says, “Do you need more events?” In this case, HTTP request. So as they flow in, they’ll warm up new instances for you. You have the ability to pre-warm those instances, if you want it to, or let the Rails boot in a sufficient enough time to answer that, meet that request so that the customer doesn’t feel it. But essentially, that’s how Lambda works. It’s one method and it works with Rails really well because you can distill a Rails application down to one line of code, calling it from a Rack perspective.
[00:27:45] CG: So you were talking about how it can use and work with Rails. Why was it seen as something that was not possible or desirable before? Can you expand on that a little bit more?
[00:27:54] KC: Yeah. I think Vlad pointed that out too. It used to be that you can only ship 250 megabytes of code. I think a lot of people, if you were to ask them, they’d go, “Well, Rails issue is just a monolith.” But if you’ve ever spun up a node microservice and you looked at the package size, Rails is no bigger than node with a few dependencies. It’s not that big. It could easily have fit inside of the 250 megabytes, and it did. We were doing Rails in Lambda prior to Lambda containers. I think a lot of people still anchored correctly so because that’s where its strong suit is as Lambda as a functional thing. There were people that tried to put Rails applications, predecessors to myself, AWS container heroes, that came up with a framework called Jets, and it literally dictated that the way for you to run your Rails app in Lambda was to take this big sledgehammer up and bang it up at the controller and the actual level and bust it up into hundreds of little functions. You should never go to microservices like that, unless there’s a good reason. Fun fact, there’s hardly ever good reasons. Rails apps will have one or two pain points. We have one application at Custom Ink still that 33% of this monolithic app can be distilled down to two API calls. From an SRE perspective, it would be easy to say let’s take those out and move them to maybe a Lambda function, maybe a Rails app in a Lambda function. I also think there’s like biases. Lambda has this thing called a cold start and it sounds super scary. I think a lot of people, when I talk to them about Rails, they would go, “Okay, wait a minute. Lambda can get the request. It services the request and then it dies and it goes away.” And what they were thinking would happen is that every time that you’d have to hit refresh on your browser, it would take three to four seconds to spend Rails up. So that’s like a no go. If it had to take three to four seconds to start Rails up every time, but that’s not the way Lambda works. It’s never worked that way. There is a cold start for when an instance is there and I’ve seen a function live for as long as 40 minutes and it does get frozen in between requests, but that sounds scary too because it’s not really from your perspective. It only takes two milliseconds to unfreeze it. So it’s essentially still warm because it’s already loaded your application code into memory and it’s sitting there ready and waiting to take the next request.
[00:30:07] VI: I want to jump in here because the cold start problem is something that I hate with a passion and it’s something that highlights a very, very real problem with these serverless options and with this offloading of effort to AWS. So when Lambda came out a bunch of years ago, when we were all much younger and much fuller of hope, it was also a younger service. It behaves different. Cold starts, if you were running Lambdas in a VPC, what were they, 15 to 20 seconds?
[00:30:41] KC: Yeah, that was horrible.
[00:30:42] VI: It was horrible. It was terrible. Many articles were published on that. But that was in like 2017, 2018. Since then Lambda has evolved massively because, again, AWS manages this for you. You’re offloading effort to AWS, and I don’t know if you know this, AWS has a lot of very, very, very good engineers that are working on this.
[00:31:08] KC: Not us.
[00:31:09] VI: Not us. Much better engineers. So what happens is services are getting better every day. So if you’re looking at Lambda and whenever a discussion about cold starts happen, somebody is going to link to an article from 2018 with pretty graphs and scientific data and be like, “Now, see, Lambda is bad.” Well, no, it’s not. That’s old. There were a bunch of improvement that AWS did, and now you can get that 20-second delay, 20-second cold start, they used to be a thing, to something of less than a second, but that’s hard. As an engineer, as a developer, it’s on you. You have to be the one that reads the news. You have to be the one that is staying updated. You have to be the one that is continuously learning. And building on what Ken said, we all have those biases. We all have those, “Oh my God, I follow with Lambda cold start,” or, “With this particular problem of Lambda so much, I’m never going to use it again.” Well, spoiler, isn’t that the same service you’ve used? A lot of things have changed and that’s something that’s hard, staying up to date with everything is hard.
[00:32:37] BH: I want to now take this opportunity to get technical advice for challenges that are in my world. So our company core software is called Forem, and that is our website, DEV.to, which we started with, that’s a Rails app. We were working on that for a few years on its own and then we decided that we wanted to go open source and gradually turn it into a generic platform where anyone can go and build their own community. So DEV is a big broad platform for developers and it offers a lot of breadth, but we now have some folks working on their own spaces. Within the technical space, you’ve got folks like in the AWS community who are saying like, “I actually run this myself and provide more depths,” like where we can have like deeper discussions over the nature of EC2 as opposed to like the breadth that we tend to be able to deliver for people, learning a lot of stuff on DEV. Anyway, that’s the problem that we work on. Just the other day, we’ve officially rolled out our self-host version of Forem, and that means, self-host, meaning you can run it in your closet on bare metal. That’s actually not fully fleshed out right now, but we do have AWS GCP and DigitalOcean has a pretty good fleshed out solutions for running our specialized Rails app for running your own version of DEV or whatever community you want to spin up independently.
[00:34:13] BH: So in that transition where originally we were not going to be open source first, generic, run everywhere, run any possibility, the code had a lot of hard coded ideas about what the software was and we needed to step back from that. And in the process, we also had to refactor a lot of stuff, which used like hyper specialized ideas around AWS services, for example, like we called into Lambda functions just to do certain things. And that was not a heavy reliance of the software, but we did that for certain opportunities conveniently and then we also hosted our images directly on S3, like not in a file store capacity, certainly not agnostic in any way. But we’ve turned the corner where now we want to be as agnostic as possible. We want this to run anywhere and any services that can run it especially well, great, awesome, let’s build a layer to have our containers sit there. So anyway, that’s the background where we’re at and what I’m thinking is having heard some chat about Lambda, about Fargate, around what the future of some of these ideas are, what’s mature, what’s immature, what services beyond EC2 can you imagine being useful in this new version of what we care about? Meaning like we’re not calling into Lambda functions as just a dedicated part of the app. We want everything sitting in our container, but I’m hearing about Fargate and Lambda and their specialized ways to sort of take on this idea. Are there any services in AWS which you feel are kind of interesting to keep an eye on in terms of just like opportunities to maintain this Rails in a container idea in just like the most interesting kind of efficient, scalable way possible that maybe we should keep an eye on? EC2 is certainly like our baseline for how to do this. But is there an idea or direction in these other services that is worth exploring at least as a hypothetical or even for the community to kind of take on, like just different ways to host our software?
[00:36:33] VI: So I want to say two things actually before I say this. First of all, you mentioned you want this after the cloud agnostic. That’s super valid and this is a good example. You started with something AWS native, and then you realize, “Oh, we want to open source this. We want to give people the opportunity to run this on their own hardware, in their own clouds, wherever they want.” That’s super, super valid. Yeah. You could run it on Heroku. You could run it on wherever else your heart desires. And that’s awesome. You built something. You got a lot of validation. And then when you had to move to something else, you build that. Nobody should be building with cloud agnostic as their first target or their first goal, unless they have very, very, very strong reasons to do that because they’re just going to be wasting time with that. So my second question would be, “What do you want to do? Are you just looking for cool services to keep an eye on to include them into the dev platform? Are you looking for better ways to package and deliver this platform to folks to make it easier to use? What are your targets?” Because we can go in many directions here.
[00:37:46] BH: Yeah. Yeah. And because this is a broad show about AWS and solving problems via AWS, we don’t need to get too specific, but a big idea of what we want to be doing is simplifying the orchestration of running many communities, ensuring that there’s both data portability as we need it, but also backups and durability. But ultimately, each application is its own fully baked version and DEV reaches millions of developers every month and it’s certainly like huge. We can justify like a lot of special attention for it, but we also love the idea that this paradigm can get a lot smaller and we can offer the same code in hyper specialized ways. Folks could host their own version of a Facebook groups kind of thing for just their family if they want, six or seven people sharing like a photo book sort of thing. Anyway, of course, this is open source software. Our ideas just that like creativity abounds, but from a technical perspective, it would be great if we could maximally simplify how we can host a few thousand without it being like that sort of horizontal scale with a decent amount of cognitive simplicity for how we do it and things like that. And that is what we’re working on, but we’re doing it in a certain way. I’m just curious within the AWS ecosystem and tools and that sort of thing, when this is our challenge, where does your head go?
[00:39:34] VI: The answer is in two parts actually. First of all, you mentioned wanting to give people a very easy option to self-host this and to run it themselves. What I’ve seen that’d be very, very, very successful here are Terraform modules for running your software in ECS and Fargate. It’s a straightforward way of running software. And by packaging it into a Terraform module, it’s a pretty well-known way of distributing software. And there are already a lot of apps that do this. Atlantis is GitOps automation for Terraform, for example. They have an absolutely awesome module in Terraform that’s deploying Atlantis using ECS and Fargate. It’s very, very easy to get started. It has most-best practices already in there for you. You just run a couple of commands and it’s up and running pre-packaged for you. And you still have a bunch of knobs and twists to configure if you want more memory for your server or things like that. I’ve seen that as a very, very successful way of getting people started fast and it’s a way lower barrier of entry than a Helm chart would have because Helm would need Kubernetes. For people that are not familiar with Helm, Helm is a way of packaging software for Kubernetes. Think of it like a step above Homebrew, apt-get, Yum, Pacman, and all the Linux package managers. So instead of doing Pacman install MySQL, you can do stuff like Helm install JIRA, Helm install Spinnaker, which needs like three services and two databases. And it does that for you and it’s easy to discover and easy-ish to run, but you still need to know Kubernetes. You still need to maintain that. So that’s way harder. I would definitely go for a Terraform module with ECS and Fargate, people where relatively easy to be able to get started with that. The second option you mentioned is families and wanting to share photos and more stuff for people that don’t have as much technical knowledge. That’s assessed. Congratulations. You’re getting into a very fun place. And this is something most people don’t know because AWS is wide. AWS actually has a full team that is dedicated to SaaS businesses on AWS. And besides that team existing, they build awesome content, including a bunch of awesome blog posts and reinvent those. They’re named the AWS SaaS Factory. So they publish a lot of white papers, a lot of videos about, “Hey, you want to run the SaaS. You want to offer your product as a service to somebody else.” In this case say DEV, but hosted and managed by you. They have videos and white papers showing all the options you have available, what are the tradeoffs, what you need to choose in terms of tenant isolation, in terms of security, in terms of price performance, in terms of all that. So I would definitely look at all the content that they’ve been putting out because it is absolutely outstanding. Some of their re-invent videos are top notch and not a lot of people know about this, which is a big problem. That’s where I would start. Ken, what are your thoughts?
[00:42:53] KC: Let me just start off by saying this. I’ve taken the same Rails app and I’ve run it on Heroku and I’ve run it on EC2 and I’ve run it in Kubernetes and then I’ve taken the same app and I’ve migrated it to Lambda. There’s this aspect of running something and compute that’s completely boring and you can do it in any one of them. What’s not boring that I think is where the future is headed isn’t how well a cloud operates with the tooling with inside of it. So I have this firm belief that when you look at AWS as sort of a programming language and a toolkit that there’s a lot of things there to learn, to have that analysis, “Where do I start? Which way do I go?” But when you start looking at it as like almost in a way as like your Heroku walled garden marketplace, which is a little bit less sophisticated, turn on Redis, turn on Memcached, there’s actually things there that go well beyond of like these traditional barriers of like where does my app run to how it responds to events. The classic one I have is a good example is when you run say Rails in Lambda, we’re all familiar with the Rails apps that need to do background jobs. Traditionally, you would use something like Sidekick. It’s the gold standard. You’d set up a Redis database in your Rails app when you want to do that background job processing. It inserts a record through active job into the Redis database. Well, what happens behind the scenes is that you actually have to spawn a whole another instance of your Rails app to pull and listen for those and manage that. There’s people that have written that software that makes that easy. I can imagine how one might support that if you were shipping your Rails application in a cloud agnostic way. If you were a cloud native and your function or your compute platform responded to events, you could natively plug into Lambda and have what I call as managed polling, like AWS does it for you. It will manage the polling infrastructure to something like SQS and it will drive events and also scale up the number of instances of your Rails function or your Rails app in a different function to meet the requirements for what’s in the queue. And you don’t have to do any of it. You don’t have to say, “Hey, is there a good open source author that’s figured out how to spawn a Rails process that hosts my app that has made all these great decisions about how to pull the software?” Which you would pay for too. You pay for polling. And how can I let these things that are things that I think all software engineers need be the sort of commoditized operation of the cloud itself? I believe in that future. But Ben, you have a very strict business case that I don’t think jives too well. I’d say there’s a decision to be made on, like, if you want to shift, run the platform in any cloud through tooling or do you want to focus on doing each cloud well, here’s how we would do it well in AWS versus Azure or GCP, et cetera.
[00:45:41] VI: I can confirm that I’ve done a bunch of multi-cloud work with SaaS companies that I’m working with. And as much as the software engineer in me wants to believe in abstraction and clean interfaces and everything is nice and rainbows, you’re never going to have the same code running in all the clouds, unless you’re just using bare bones, software barebones services, like easy to use some kinds of story, some kind of load balancer, and that is fine. You can definitely do that, but it’s going to be a lot, a lot, a lot of engineering effort. What you can do and what most multi-cloud companies do successfully is having their shared logic in a library and then building wrappers for every single class. So if you are using Lambda to optimize something, you can use Google Cloud Functions in Google. You can use Azure Functions. You can use whatever you want. DigitalOcean, as far as I know, they don’t yet have that. Awesome. In DigitalOcean, you could be running that in Ruby. You could be running it with Sidekick, for example. You could be running that Elm code yourself. So each cloud is going to have its own wrapper, its own app, kind of it and then optimize for stuff in there. How far you want optimize? That’s definitely going to be an option, but you’re never going to be running the same code unless you just want to throw a lot of engineering resources at it.
[00:47:14] KC: Yeah. And I think to quote my inner Corey Quinn, the real vendor lock-in is IAM. Right? It’s not that you use S3. It’s the permissions that underline all these, when your app is sufficiently sophisticated has to talk to additional resources.
[00:47:29] VI: Even if you have one container, you’ll have to talk to resources, you have to be allowed to pull the image, you have to be allowed to receive traffic from your load balancer, but not from like all the internet because you don’t want to be pinged on all the ports by scripts to scan the internet. So yeah, there are a lot of points of lock-in, but they’re not lock-in. They’re effort of loading.
[00:47:52] BH: When we talk about like a lot of these ideas, we have sort of debates, we can talk about what the right abstraction layer is, should I go with the specialized AWS tool? Should I rely on AWS configuration for my business logic? Or should I build it into my application and pop it in a single AWS service? As long as we’re bought in that that’s an AWS shop and that’s how things are done, we’re not trying to say like, “Oh, we should be multi-cloud.” If we’re talking the company organizes around the idea that it’s an AWS shop, how do you make cultural agreements about where to put business logic, I think? Does the organization understand that we should write this into our application code and use a few AWS services or lean on AWS to do a lot of that work for us and there’s going to be trade-offs here? How does an organization get down to some shared principles around that sort of thing?
[00:48:58] VI: In a very, very, very hard way. So think of all AWS services as their own tiny SaaS company. They might fulfill your needs or they may not. You might be able to implement your logic in there or you may not. Let me think of an example here, CloudWatch. CloudWatch scaling is a thing. It’s working, but it’s not a very feature field because AWS has to provide that service to a lot of customers. At a big scale, they have to focus on a specific set of customers. You have CloudWatch map, which is okay, I can scale based on the number of messages in a queue. I can do some minor additions, subtractions, trends on that. That might be awesome for a lot of people. For others, it might not be enough. Like, “No, I want to scale on these 20 different metrics and I want to do a bunch of complex math on it and I want to do some very weird things.” CloudWatch is not going to let me do that. So I’m not going to be able to offload that effort to AWS. I’m going to do it myself. Right now, in 2021, the way of doing it yourself is going to Kubernetes because of lot of folks are standardizing on that and you have a lot of tooling, where AWS is not fitting your options, you’re going to Kubernetes because you have open source software that you can leverage and do whatever you want. Now how you actually do those decisions and how you actually decide what should run where and where to move, that’s hard. That depends from each company, from each culture and from their own targets. Is it a company that has a bunch of senior developers that can draw on a workflow engine very well and have experience with that? Sure. Run on that. What would be a good example here? Honeycomb. So Honeycomb is an observability company. They’re awesome. I’m not affiliated with them in any way. But they have a lot of senior engineers that had a lot of experience with traditional, and I’m not saying that in a bad way, traditional EC2 instances. So they’re not doing containers, they’re not doing Kubernetes because they had a team that already knew how to run software on EC2 instances very, very, very well. Getting that whole team to learn Kubernetes or to learn something else would not have fulfilled their business service. There are companies that are full of serverless developers. [INAUDIBLE 00:51:26] is one of them. They have a lot of serverless developers. They’re building on that. For them, to run on Kubernetes, with me, they would have to learn everything else. For them, to run on EC2, they would have to change their whole company culture. They would have to retrain. It doesn’t make sense. So it’s always a tradeoff of, “What does my team look like? What does my company look like? What’s my timeline? What are my restrictions? What are my strengths? What are my weaknesses?” And it’s a hard discussion to have.
[00:51:56] BH: So Custom Ink seems like the size of organization that is a good AWS customer, big enough to really benefit from what Lambda can offer in terms of scalability, like an actual need for some of these scalability needs, but small enough that it’s not Amazon itself, it’s not going to push the envelope about the maximum web scale needs. So it’s sort of like fits in a good space of justifying any complexity that comes from just adopting these AWS services in the first place because the scalability is certainly justified. It seems like a good goldilocks zone for like a really easy way to just be very AWS first. Do you have advice for a shop that’s maybe smaller than Custom Ink, less mature? What pitfalls to avoid in terms of like unnecessary AWS complexity before you get to the point where you’re facing the challenges of human scalability and the scalability of request delivery and things like that? What should smaller shops not do that Custom Ink needs to do?
[00:53:09] KC: Set up Kubernetes would be at the top of the list. No matter which way you go in the cloud, go for the managed option first, right? Like use simple stuff, a friend at a local Ruby group, she created a simple sort of, I think it’s sold in the Heroku marketplace and allows for simple file storage where she does all the work around S3. Another friend of mine that set up that sort of Expedited SSL in the Heroku marketplace, Michael Buckbee. They each started off with it’s one, it’s her, Michael was him, there’s not a team of engineers, just writing basic software and using AWS and they use each service well. Michael did something that I think that was interesting. It was a side story, but he made a service that probably no engineer thought anybody would buy, but that’s because engineers don’t understand what people don’t want to do and then setting up SSL services is at the top of the list I think for everybody. But when another app was being made, she used S3. I would think dependent upon your cloud maturity model and where you are with it, Amazon is going to be a good story for you. The services that you may need, whether it’d be one or two or three together, I can’t think of anything that’s a no-go, like don’t do this. I think one big one is, how about this? Do not pay attention to your bill. We talk a lot about scaling and stuff like that and I think things do cost money. So dependent upon how you store stuff within S3 or how you set up your software platform, you may take on a lot of risks for costs. So I think for everybody, just pay attention to your bill.
[00:54:51] VI: Making mistakes is normal and it’s totally fine and acceptable. That’s how you’ll learn. That’s how you’re going to learn that EFS, while a very good service, EFS is basically Amazon’s managed NFS service. So storage on a network in a whole region, you can access it from any AZ. It sounds very, very easy. It’s way easier to integrate with than S3, if you’ve never worked with it, but it’s also very, very expensive and very, very slow when compared to S3. You’re going to learn that. That’s one of the services, for example, that I would stay away from, unless you need it. And that’s fine. That’s how people learn. That’s how they mature. And we’re going back to what I say initially, build on the community, ask, look at, “Oh, how did other people implement this? What was their experience?” Because most of the people are blogging about it. And I also say be careful and try to look at limitations of services. There might be services that are super, super new, that are not ready. Allow time for learning services and for experimentation, especially if it’s a campaign that’s not very experienced with AWS or that doesn’t have access to people that have done all these mistakes before.
[00:56:09] CG: Thank you both for joining us today.
[00:56:11] KC: Christina, thank you so much.
[00:56:12] VI: Thank you so much for having us.
[00:56:23] BH: This show is produced and mixed by Levi Sharpe. Editorial oversight by Jess Lee, Peter Frank, and Saron Yitbarek. Our theme song is by Slow Biz. If you have any questions or comments, email [email protected] and make sure to join us for our DevDiscuss Twitter chats every Tuesday at 9:00 PM US Eastern Time. Or if you want to start your own discussion, write a post on DEV using the #discuss. Please rate and subscribe to this show on Apple Podcasts.