Beware the Trojan Source!
In this episode, we talk about a new Apple settlement and a big win for workers in Portugal. Then we speak with Nicholas Boucher, PhD candidate at the University of Cambridge about new research into something they are calling a “Trojan Source” attack. And finally, we chat with Safia Abdalla, senior software engineer at Microsoft about new updates and features in the recently released .NET 6.
Saron Yitbarek is the founder of Disco, host of the CodeNewbie podcast, and co-host of the base.cs podcast.
Josh Puetz is Principal Software Engineer at Forem.
Nicholas Boucher is a PhD Candidate at the University of Cambridge and a graduate of Harvard University. He performs research on the security of systems ranging from machine learning pipelines to compilers.
Safia Abdalla is a software engineer working on open-source technologies at Microsoft and an open-source maintainer on the nteract project. When she’s not working on open source, she enjoys writing and running.
[00:00:10] SY: Welcome to DevNews, the news show for developers by developers, where we cover the latest in the world of tech. I’m Saron Yitbarek, Founder of Disco.
[00:00:19] JP: And I’m Josh Puetz, Principal Engineer at Forem.
[00:00:21] SY: This week, we’re talking about a new Apple settlement and a big win for workers in Portugal.
[00:00:26] JP: Then we speak with Nicholas Boucher, PhD Candidate at the University of Cambridge, about new research into something they’re calling a ‘Trojan Source’ attack.
[00:00:34] NB: A summary of the paper is that we’ve found that we can encode source code files or most programming languages in such a way that compilers will do something different than what the developers expect.
[00:00:45] SY: Then we chat with Safia Abdalla, Senior Software Engineer at Microsoft, about new updates and features in the recently released .NET 6.
[00:00:53] SA: In addition to kind of being a very interesting problem to solve, which is how do you make a technology approachable to new developers, it also allowed us to collaborate with teams all across that stack.
[00:01:05] SY: So you might remember last season when we talked about a class action lawsuit Apple settled with US app developers were paid out 100 million dollars’ worth of payments to app makers and will allow app developers to promote alternative payment methods that can circumvent Apple’s 30% commission of in-app purchases of digital goods and services. Now to be fair, the alternative payment methods couldn’t be in the apps and developers would have to email users about payment methods they can use outside of the app, but still a small victory for those app developers. Well, this week, Apple has settled another lawsuit, but this time with its own store employees. The story which was first reported by Bloomberg says that the company has agreed to pay out $30 million to employees who say that they were forced to stay after working hours sometimes for as long as 45 extra minutes, unpaid, to have their bags searched for potential theft before being able to leave. Apparently, Apple’s CEO, Tim Cook, was unaware of this policy until the 2013 lawsuit and terminated in 2015. So that’s interesting. I didn’t know that Apple used to do that actually.
[00:02:16] JP: What struck me about this particular settlement was if you read through the fine print, it only impacts Apple retail employees in California. Everywhere else, no compensation. From what I understand, it’s pretty common for retail workers to be searched upon leaving or entering their workplace. But 45 minutes, I mean, that’s like they’re boarding…
[00:02:38] SY: That’s a while.
[00:02:39] JP: Yeah. That’s like a lunch break. That’s a while. I think to me what was surprising about this article is it felt like off-brand to me. Like on the one hand, I totally appreciate that Apple has very expensive products that aren’t huge, relatively easy to steal. Right? So I get that just from protecting our stores and protecting our goods, just from a purely business perspective. But when I think about the Apple brand of like design and creativity and expression and all that, I don’t know, I guess I always felt that that extended to their store employees where store employees were kind of part of that same family. I hate using the word family in business context, but it does feel that way. You know what I mean? It feels like an extension of the Apple brand and they’re always very knowledgeable and super helpful. And it feels weird to frisk your brother. You know what I mean? I don’t know.
[00:03:31] JP: Yeah. The other hand though, I mean, this is kind of been an ongoing theme that Apple treats its retail employees quite differently than they treat their other non-retail employees. For better or worse, their retail employees are not treated as well. They’re really not paid as well.
[00:03:49] SY: Right.
[00:03:49] JP: They don’t get to work in the big spaceship in California. They’re out in actual stores. So I mean, this kind of track, I think the disappointing parts to me is that it took a lawsuit for Apple to actually like compensate their employees.
[00:04:00] SY: That part is crazy.
[00:04:03] JP: Yeah.
[00:04:03] SY: Yeah. The fact that they were kind of doing this backtrack thing was very surprising to me just because it just felt very off-brand for Apple. But on top of that, not paying them felt very off-brand. I kind of felt like, I don’t know, I guess I always had this impression and maybe it’s just good marketing that Apple generally like did the right thing and stood by its people, at least in terms of like its users, it’s very like user advocate type of brand. And it’s really shocking that they didn’t pay for the extra time. That feels like an obvious thing to do. Like if you’re going to make me stay at work for whatever reason, then you should be paid for that. So it’s surprising that that took a lawsuit to fix.
[00:04:38] JP: Yeah. I agree with that. On the other hand, though, I do just kind of feel like we’ve definitely seen cracks in Apple’s façade in terms of how they treat their employees, how they treat their customers, how they treat developers. Their marketing team is incredible. Their advertising and their branding is incredible. But I think we’re seeing more and more that they’re just a large company like any other large company. And at the end of the day, they’re looking out for their profits.
[00:05:03] SY: They’re beholden to their shareholders at the end of the day. That’s really what it’s all for. Well, anyways, good for the employees who got their money. And I mean, it’s great that they’re no longer doing this. So that also kind of was surprising to me that Tim Cook found out about it in 2013, but it still took two years to stop it, which I don’t know that was about.
[00:05:22] JP: Yeah, that part’s pretty disappointing. I know a CEO is not going to be privy to every single little micro issue and policy happening, but to your point where Tim Cook is going on television and he’s telling people that they believe in their employees and they care about their customers. For that person then to know about this policy and immediately try to have someone look into it, that part feels really bad as well.
[00:05:46] SY: Yeah. That doesn’t feel great.
[00:05:47] JP: Well, next we have another story that’s a big win for employees. The Portuguese government passed legislation banning employers from contacting their staff outside of business hours, as well as monitoring their work remotely. If employers contact their employees outside of their contracted working hours, they can be fine unless it’s deemed an emergency. Remote employees are also required to go into physical workplaces to meet with supervisors and build community with their fellow employees so they don’t suffer too much isolation. The move is an effort to not only protect workers for working outside of their contracted working hours, but to promote Portugal as a destination to work remotely. Ana Mendes Godinho, Portuguese Labor Minister, said while presenting the new legislation, “We consider Portugal one of the best places in the world for these digital nomads and remote workers to choose to live and we want to attract them to Portugal.” This is a really interesting legislation. What do you think about this, Saron?
[00:06:41] SY: Yeah. So I'm totally down for the no contact after work hours. I think that totally makes sense. I think that’s very respectful of boundaries and it takes the burden off of the employee to set the boundary because it’s like, “Well, it’s not me. The government said you can’t call me. I would love to hear from you. But ultimately, the government said no.” You know what I mean? It kind of sets up that boundary, which is really nice. But in terms of trying to be a place to attract digital nomads, the whole require to go to a physical workplace I feel like totally kills that. You know what I mean? If I’m working in America, but I want to be digitally nomadic and I want to move to Portugal, obviously that’s not going to work because I need to have an office to go into. So I didn’t get that part.
[00:07:24] JP: Yeah. That part is really strange to me where on the one hand, they’re saying have less contact with your employees for better or for worse. On the other hand, they’re saying, “Bring them on in.” It almost feels like maybe this was a concession to physical employers that were kind of like, “Oh, we don’t love this.”
[00:07:39] SY: It feels like a concession. I think that’s right. Yeah.
[00:07:41] JP: Yeah. That part’s really strange. So the other part that struck me about this law was that employers could be fined for contacting their employees outside of work hours. I’m really curious, who reports that and how does it get enforced?
[00:07:56] SY: Right. Is there like a hotline or something where you just send screenshots of texts and you’re like, “Look what happened. This dude, it’s 9:45 PM, man. Look what he’s up to. Fine him $100”?
[00:08:07] JP: Yeah. Share to channel government reports.
[00:08:11] SY: Yes.
[00:08:13] JP: Yeah. That’s crazy. I really wonder how this law gets enforced, especially because the company’s not going to self-report it. It would have to be the employee that’s reporting this and then it’s going to be pretty obvious who’s reporting it. So is there going to be backlash?
[00:08:26] SY: Right. It feels like a backlash situation. It feels like you would need some courage to actually do the reporting.
[00:08:32] JP: I love the idea though, that Portugal is like, “Hey, digital nomads, come work here. We want to attract you.” I think that’s a really cool idea.
[00:08:39] SY: That’s cool. Yeah.
[00:08:40] JP: I’ve known several people that have worked remotely abroad and chosen Portugal for its lovely climate and friendly people and low cost of living. It seems really interesting to me. I wonder if we’ll see other countries attempt to court remote workers and digital nomads with laws like this.
[00:08:57] SY: I’m curious. What are your thoughts on not fall in office, but requiring one or two days a week as part of the remote deal, the remote setup? How do you feel about that as a stipulation?
[00:09:11] JP: Well, I will say off the bat, I’m a completely remote worker. So any kind of like you must go into the office is like, “Ooh!” I don’t love it. Yeah. I find that it has the opposite effect. Right? We want to attract digital nomads. We want to attract remote workers, but then having that in-work requirement part makes it so you have to cluster workers in physical locations. It doesn’t say what office they have to go to. Maybe it’s just “an office”. It’s a little unclear from the way the law is written and the research we’ve done. If they have to go to like their home office, if they just go to an office anywhere with another employee, it’s a little unclear, but I thought that part really kind of killed the other part of the law that was trying to encourage remote working.
[00:09:59] SY: Exactly, a hundred percent. I feel like the biggest benefit of remote working is that you can live anywhere you want. You can live halfway across the country. You can live in another country that you can literally be wherever you want and still have the job that you want and work for the company that you want to work for. And the moment that any type of in-person requirement shows up, you lose that freedom. And that’s like the biggest advantage of working remotely.
[00:10:23] JP: Yeah.
[00:10:24] SY: Yeah. I mean, obviously having some flexibility is good, but if it’s not fully remote, I feel like you lose the biggest benefit of being remote and any type of in-person requirement just kind of undoes all that.
[00:10:37] JP: Yeah. It kind of seemed like the way, there’s another provision of this law that it gives young parents the right to work from home without having to get approval from their boss, as long as their child is less than eight years old, which is fantastic.
[00:10:49] SY: Oh, that’s cool. That’s great.
[00:10:50] JP: Yeah. I’ve overall read this law as it seemed like it was pitched to companies that might not have a lot of remote work going on right now or to try to push them into being more remote friendly. And I wonder if you’re a fully remote company, there might be some exceptions to the in-person work requirements. But really for the rest of us, it’s not so much. It’s interesting that the labor minister of Portugal brought up attracting foreign workers and making Portugal a digital nomad destination when the rest of the law is really around work-life balance. It’s not really around remote work. If they were targeting remote workers that expect to see things in this law, you could use your home country’s tax rates or things like that. Or we’ll give you a longer visa if you’re remote working in Portugal. And there’s none of that in here. So I think this law is just overall pitched at in-person companies and to try to encourage them to do remote work or do work from home, but maintain work-life balance.
[00:11:53] SY: Yep. Absolutely. Coming up next, we talk about a new kind of hack researchers are calling a ‘Trojan Source’ attack after this.
[00:12:24] SY: Here with us is Nicholas Boucher, PhD Candidate at the University of Cambridge, and author of the research paper Trojan Source: Invisible Vulnerabilities. Thank you so much for joining us.
[00:12:35] NB: Thank you for having me.
[00:12:36] SY: So tell us about what this paper is and how you started looking into this search.
[00:12:42] NB: A summary of the paper is that we’ve found that we can encode source code files or most programming languages in such a way that compilers will do something different than what the developers expect. And what this means is if you’re clever about how you do these text encodings, you can write up some software and you can sneak some kind of a bug or a vulnerability or a backdoor into the code, and you can do so in such a way that a developer who’s reviewing the code will very likely not realize that that vulnerability is there. The way that we stumbled across this research is some of my research partners and I here at Cambridge and also with some collaborators in University of Toronto in Canada had been looking into natural language processing systems and we became interested in the idea of crafting adversarial examples, the inputs to machine learning systems that caused them to do things that you would not expect them to do. And what we had seen is that in text-based machine learning systems, adversarial examples were traditionally created by doing techniques like misspelling words and things like this. And you can have some interesting enough results, but we thought it would be really cool if there was a way to create these adversarial examples in a way that wasn’t so visible to humans. And what we had stumbled across was that we could use strange text encodings, things within say the Unicode text specification. They were a little bit less standard, but perfectly valid, according to how we are allowed to encode text. And we saw that we could have things like toxic content classifiers and machine translation and a whole bunch of other machine learning, natural language processing tasks just completely fall over when we presented them these strange encodings. We started thinking about it and we said, “Well, what more could we do with these encodings?” And we quickly stumbled across the idea that, “Well, hey, we can probably do some cool stuff with compilers, with interpreters, with the source code in general and see if we can do some kind of evil things.” And what we ended up stumbling across was the Trojan Source paper.
[00:14:41] JP: I’m curious, are there similar types of attacks to the Trojan Source attack that have been used by hackers in the past?
[00:14:48] NB: Yeah. That’s a hard question to answer explicitly because certainly there’s all sorts of stuff that’s out there that we just might not know about if it’s not say public on GitHub, but there are concrete examples that I can give of similar techniques that have been used in slightly different settings or in some cases some very similar settings. So to me, the most interesting example is this attack that we saw some reporting on in the wild, on the Ethereum blockchain, specifically with smart contracts. And what we saw was that, apparently, there have been bidirectional override characters used to switch the center and the receiver of say payments in smart contracts out in the wild. The way we came across this is we’re actually doing some scanning on GitHub to see if we can find any evidence of these attacks being exploited in the wild. And what we found was actually a set of kind of static code checkers that were looking for switching the order of arguments on Ethereum smart contracts. That is a certainly similar technique to what we’re proposing. I mean, arguably it’s very similar in terms of one aspect of what you can do, which is switching the order of arguments. In the Trojan Source paper, our proposal is that you can do a lot more than just switching arguments. You can cause comparisons to fail. You can inject lines of code that look like they’re not there because they look like their comments. You could, for example, cause a function to return early before it actually executes the logic that a developer may expect. So I think the short answer to your question is, yes, there are versions of this that have been used in the wild. There are versions of this that have been written about in non-academic context, but what we believe we’re presenting is kind of a somewhat more comprehensive overview of what we believe are a potentially very devastating set of attacks.
[00:16:34] SY: Tell me more about how these Trojan Source attacks are novel compared to other kinds of the attacks that you just talked about, similar attacks.
[00:16:44] NB: Yeah, absolutely. Well, I mean, right off the top, the most straightforward thing is to our knowledge, there’s no academic work that describes any of these other variants of the attack that I was just talking about. In terms of how ours is different from that, so to be very explicit, the examples that we found were taking two arguments to functions and using bidirectional override characters to swap the order of those two arguments. In most of the examples that we found, these arguments based on the techniques that were used could only be variables that were a single character, like a variable named A or B or something like this. In the right setting, you could cause those two be displayed in an order that was reversed from what they actually were. Now that setting was very specific. It required comments to be kind of within the function, call parentheses, and some things like this that looked a little bit perhaps unnatural, but certainly used by directional overrides to cause problems. In our attacks, we propose three different general techniques. So one of those techniques is we take comments and we make them look like a code or we take a code and we make them look like comments by taking bidirectional override characters and embedding them in comments on the same line as some real code. We have a similar technique where we do this within strings. So the idea is that if I have a string literal, I can inject a right to left override or left to right override depending on the setting or a combination thereof. And I can make that string at the encoding level contain these control characters, but at the visualization level, I can take things that are within the string and make them look as if they’re actually code that’s being executed, which is not the case when that code gets to the compiler. So in short, I think the difference between what we found in the wild and what we’re talking about in the paper is just in the breadth of what you can do, whether you want to cause code that’s there to look like it’s not there, whether you went to cause code that’s not there to look like it’s there, or whether you want some function to short circuit altogether. These are the techniques that we’re offering in the Trojan Source paper, but we certainly want to give credit to a variety of security researchers, primarily in the wild that have used bidirectional overrides to swap the order of arguments.
[00:18:59] JP: Would you consider this attack factor an issue with Unicode? Is it an issue with compilers and their handling of Unicode or is it something else?
[00:19:11] NB: Yeah, I think that’s a really good question. And I don’t know that there’s a straightforward answer to it. So the methodology I’ve been used to approach this. So compilers often implement a formal language specification, not in every case, but often there is some formal document describing what a proper language is supposed to do. For languages that have formal specifications, you can argue that the language specification is where you might want to protect against something like this by saying that, “Hey, these directionality override characters can actually be pretty dangerous.” We might want to forbid them to be used in certain contexts.” And then you would expect the compilers would implement that behavior downstream. But the challenge there is not every language has a formal specification. Maybe those changes take a long time to happen. Maybe it’s not agreed upon by the spec maintainers that that’s desirable. So then you have this next step in the pipeline, which is compilers and interpreters can do things like throw compiler warnings if they detect the presence of potential attacks like this. I think a great example of a defense that’s been really well-built in that regard is the Rust compiler, the Rust team was a group that we sent a disclosure to as we were doing this research and they did a really good job in my opinion of having thorough verbose warnings that they throw when the specific attack pattern is detected. But the reality there too is not every compiler interpreter is going to see this as their issue to fix. And that’s fine. I wouldn’t necessarily agree, but there are other places you can protect against this. And perhaps one of the most impactful overall would be static code scanners because if we have high a value production code that’s being deployed somewhere, we would hope that there is some sort of static code scanning that’s going on. And I think that takes a lot of different forms, but doing a check for bi-directionality overrides is a relatively straightforward thing I would expect for most static code scanners to be able to add functionality for. And what I would finish with is to say that there are those that would look at this and say, “This might not be a compiler interpreter or a static code scanner problem whatsoever. This has to do with the visualization of texts.” And if it’s about the visualization of text, then naturally any mitigation against this should be in text rendering systems and people who take this viewpoint would argue that when we have development environment and text editors and things like code repository front-ends that store our git repos online, these are places where we could work to visualize this directionality override characters in some way. And just making those visible would hopefully mitigate this attack in some way. But my personal viewpoint would be to say that every one of those defenses is valid. And whether this is a vulnerability in Unicode is perhaps, depending on the context, but if we defend it every layer in the pipeline, we’re more likely to be able to pick up on these attacks when they happen, rather than saying that only one in five compilers, just to make up some numbers, have fixes in them. If we say that one in five of each of these parts in the pipeline have effects, then it’s much more likely that if there is an attack, it will be detected.
[00:22:26] SY: So tell me more about the dangers of an attack like this. How big could the impact be?
[00:22:33] NB: I think what’s really scary about these potential attacks are that in the right setting, they’re not visible. That’s how we titled the paper and probably key message we’re trying to get across here. If you have employees at a big tech company or producers of some big product that all share common toolkits, common development pipelines and things like this, and you as an attacker have knowledge of the tools that they’re using, you can most certainly craft these attacks, targeting that particular tool chain, if there aren’t mitigations in place. And I think it is entirely plausible that you could deceive a whole group of code reviewers and to thinking that one of these attacks is not present. And where that gets really scary is if you had say an insider at one of these companies. Well, now that gives you a realistic attack vector where you can use that insider to put code in that is not very likely to get seen by someone else. And that gets incredibly scary within the context of supply chain attacks. That gets scary within key pieces of software that are utilized across the ecosystem. Off the top of my head, something like the recent SolarWinds attacks come to mind. Certainly the SolarWinds attacks weren’t related to Trojan Source in any way to my knowledge. But the point being that when you compromise something that is upstream and used across the ecosystem, which could certainly be possible with these invisible vulnerabilities, well, the consequences can be absolutely immense. And that is what potentially scares me as an invisible attack that goes undetected for quite some time on some large upstream dependency.
[00:24:11] JP: Can you walk us through like soup to nuts what a person would have to do to implement this attack? We’ve been talking about comments related to the bidirectional nature of some Unicode strings. Is it embedding code inside of a Unicode string and then injecting into a source code base? I’m just wondering if you could talk us through like the specifics if I wanted to implement this attack.
[00:24:37] NB: So if you sat down at your workstation today and decided that you wanted to implement this attack, what you would need to do is come up with some sort of a method for generating these control sequences that change the order of texts in Unicode. It doesn’t have to be Unicode, but Unicode is probably our most common text standard that is generally accepted by most compilers and interpreters. So let’s say that it’s Unicode. So what would you probably do? You would probably pick your scripting language of choice. Let’s say Python. And you would know that in Python, there are escape sequences that you can use that will generate control characters that are otherwise invisible. And then you would take the program that you are hoping to craft an exploit into and maybe you just have a print statement in Python or something along those lines that prints out that source code, but then you modify that source code to shift around the order of the tokens on any given line of code and then you use the escape sequences that generate control characters to inject say right to left overrides into that line of code. You can think of it in some ways as conceptually similar to like a SQL injection attack, the idea that you’re putting in some characters that aren’t necessarily expected and you’re doing so in a way that is more or less syntactically valid in the setting that you’re using them in. And it causes some unintended consequences to happen. In this particular case, I would say you would probably be injecting these right to left override characters, assuming you’re writing code that looks like English that’s normally left to right. You would probably use a sequence of right to left overrides and left to right overrides and you would put them into string literals, if you had string literals in the code, or you would put them into comment, maybe multiline comments, for example. And you would use those to be able to swap around the display order of tokens on that line of code. And ultimately, what you’re doing is you’re saying that if Program A is the source code that is seen by the compiler, it is the logically encoded order of the tokens in the source code, you’re anagramming that with control characters into Program B, which is not the actual logic, but it’s what you want the developer, the code reviewer to see. And then when they render that in their text editor or web editor, whatever they want in their environment of choice, they’re going to see the anagram diversion most likely and not the logically encoded version due to those control characters you injected.
[00:27:07] JP: And because you’re basically flipping this around and there’s no real visualization of that, unless you’re specifically looking for this attack vector, you’re not really going to catch it. Right?
[00:27:17] NB: Yeah. That’s the fear. There’s some caveats to put in there that I think are important. And that’s to say that this is definitely dependent upon the setting that you’re in. So for example, there are some text editors that presumably as a security measure just don’t use control characters, is in if you put a control character in the code, it’ll display the code point, the number representing the control character, instead of doing the thing that that control character would normally cause a text display engine to do. Now that being said, from our testing right now, that’s certainly the minority of code editors that we’ve seen. If you take maybe one example of a vulnerable code editor is VS Code. And in VS Code, if you have some text that has controlled characters in it, at least as of right now, it will use those control characters. If it says to do a right to left embedding or right to left override, it will swap the order of the characters and not show you that invisible control character. That is the setting that we’re worried about here.
[00:28:20] SY: So given that this type of attack is invisible, really hard to detect, what can we do if anything to defend against it or potentially even block these attacks?
[00:28:32] NB: Yeah, absolutely. I think one of the silver linings here is that this attack is actually pretty easy to defend against. And if you know what you’re looking for, it’s actually pretty easy to spot too. But the trick is you have to know what you’re looking for, and that’s why we’re hoping to spread the word, this particular attack vector. So the easiest thing is if you see directionality override characters or these bidirectional Unicode sequences in code, that’s a red flag that something could be off. Now it’s not a guarantee that there’s a problem. There’s all sorts of legitimate reasons that you might want to change the display order of texts. Like for example, if you are a developer who writes in both English and Hebrew, for example, well, that means you write in both a left to right and a right to left language. And if you decide that you want to change the order that sequences are displayed when you are mixing those two languages, you might have a text editor that injects some of these characters. So that is to say that simply seeing that there are directionality override characters in source code isn’t a guarantee that there’s a problem. But that being said, having looked through, well, actually the entirety of public GitHub, all commits that were committed to GitHub in 2021 thus far, I can tell you that there is actually not all that much of these directionality override characters in practice. But that being said, if you want to narrow this group down even more, you can say that every directionality override character normally has a closing character. So if you turn on a right to left override, you can turn off that right to left override when you’re finished with it, so to speak. And the nature of this attack to our knowledge is only possible if you do not properly close that directionality override sequence once you open them. So the real trick here is if we have say a string literal that contains a directionality override, if I start overriding the direction in the string literal and I don’t close that directionality override before the end of the string literal, it means that it’s going to affect the code, the display of the code next to that literal. However, if I am to terminate that override sequence by having a turn off override character within the same string literal, well, that means the effect of that reordering is contained so within the string and it actually doesn’t cause a security vulnerability in relation to Trojan Source, at least that we’re aware of. So all of that in summary is to say, if you want to protect against this attack, you should look for unterminated directionality override characters in any Unicode encoded source code file. And if you find them, you should double-check that the logically encoded version of that source code is indeed what you expect it to be. You can do that by just looking at the raw bytes that are used to encode that source code. And with any luck, we’ll see more and more code scanning tools picking up and warning when they see these potential indicators of attack. And I’ll also say that we have seen a number of compilers and interpreters starting to throw warnings and errors for this sort of stuff. We’ve also seen that shortly after we released the paper, GitHub started putting a warning banner on top of any code that contains these directionality override characters. So these are really great ways to be able to tip you off to say, “There might be something wrong here,” which is not a guarantee, but you should look closely at this line.
[00:32:01] JP: So your paper does point out this is a theoretical attack, and I’m wondering if there’s any sense you have of the likelihood of someone actually exploiting something like this.
[00:32:11] NB: Yeah, it’s a really good question. I think it’s a hard one to answer.
[00:32:16] JP: Right.
[00:32:17] NB: So I can tell you what I don’t think will happen and what I think might happen, but it’s only my personal opinion. So what I don’t think is going to happen is that all of a sudden, we’re going to start seeing millions and millions of people trying to use these control characters to sneak stuff into lots of repositories, like the Linux kernel or something like that and finding success, because I think that at this point there are enough compilers and enough people and enough mitigations, like GitHub’s warning banner, that will tip off someone somewhere to say, “We should look closely at that line of code.” That’s still a potential threat vector, but I don’t think that we’re going to start seeing these large massive attacks on say the open source ecosystem right now. What worries me more is that scenario where you have say a company that maintains some private code base where there’s an insider threat at that company or there’s maybe some kind of advanced persistent threat actor that has the ability to introduce code into a pipeline within that company in some way. That’s a hard thing for us to be able to measure it to say how likely that is or where it’s possible. But intuitively, there certainly are threats that are very well-funded. There are threats that are very active and persistent in different ways. And if you have the patience and you have the time and you have the resources, what this does is it gives you a tool to be able to say, “Here’s a vulnerability that you can’t see at least in the right setting.” And I think that there is a very real possibility that we see some of these larger, more powerful adversaries use these in very targeted ways as another tool in their toolkit.
[00:33:57] SY: Is there anything else that we haven’t covered yet that you want to talk about?
[00:34:02] NB: There’s actually one other thing I’ll point out, which is an interesting thing we stumbled across in this research, and that’s the nuances of coordinated vulnerability disclosure. And when we started this research, we thought that the entirety of the contributions of this particular piece of work would be to try and point out and say, “Here’s something we think is a vulnerability and here’s a taxonomy you can use to think about it and we’re interested to see if anyone can think of any other cool tricks that we can use like this.” But then we started realizing, “We don’t just want to take this vulnerability and throw it out there and see how many unmitigated systems are attacked on day zero. We want to go through the responsible disclosure process and notify at the very least all of the compilers and the web front ends that we know are affected from our own testing. And that’s precisely what we did, but the interesting thing was this vulnerability, it really affects a lot of stuff. It affects pretty much every compiler interpreter that we tested. It affects a very large number of code editors and websites and things like this. We did our best to notify as many as we could and it put us in this interesting scenario where we felt like we were trying to send disclosures across the entire industry. It was a really interesting experience. There are all sorts of things that we as researchers just hadn’t considered before. For example, if there is a compiler that is open source and has lots of contributors that doesn’t happen to have a dedicated security team that’s willing to receive an embargoed disclosure, how are you supposed to notify them so that they can prepare a mitigation before the attack is public? That’s a genuinely tough position to be in. That’s one example. There’s probably dozens of scenarios like that that we came across. And we talk about that a little bit in the results of the paper and I expect that we will be writing about that more in the future.
[00:35:58] SY: Well, thank you so much for joining us.
[00:35:59] NB: Thank you very much. It was a pleasure to be here.
[00:36:07] SY: Coming up next, we get into the details of the newly released .NET 6 after this.
[00:36:24] SY: Here with us is Safia Abdalla, Senior Software Engineer at Microsoft. Thank you so much for joining us.
[00:36:31] SA: Thank you so much for having me.
[00:36:33] JP: So tell us about your role at Microsoft and what you’re doing on a day-to-day basis.
[00:36:38] SA: I worked in an organization known as the Developer Division. And as the name might suggest, this organization’s kind of responsible for building a lot of the developer-focused tooling and projects that come out of Microsoft. This includes things like TypeScript VS Code, .NET, which is kind of the sub team that I’m a part of, and in particular ASP.NET, which is a web framework built on top of .NET. So a lot of my day-to-day work involves working on ASP.NET, .NET both the runtime time and ASP.NET, the web framework, and a lot of other technologies in the .NET ecosystem are actually open source. And that was one of my big motivations for moving to this team was I wanted to kind of work in a code base and just in a project that was open source, had a lot of community engagement and involvement. That’s kind of where I’m at home. My day-to-day is surprisingly being on GitHub a lot, interacting with customers on issue, reviewing PRs from my team and from community contributors. So I think what’s really fun about the work that I do at the moment is that when I squint, I don’t really see a big difference between the open source work that I’ve done in the past with other open source communities I’ve been involved in, and then what I do on a day-to-day basis with ASP.NET. Obviously, there are some differences because each project is unique, but the same philosophies around openness, community engagement, learning, and teaching with each other in the public apply in both scenarios.
[00:38:07] SY: So what is the current state of .NET? Where does it stand in the world of software, in the world of tech?
[00:38:13] SA: So .NET in and of itself at its core is a runtime that is capable of executing code. The primary programming language that people interact with when they’re writing code that is going to run on the .NET runtime is C#. That being said, it’s actually possible to write code in a variety of other languages, F# being one of them. Python is another, and even Ruby that can be compiled down to like code or intermediary code that runs on the .NET runtime. So you have the .NET runtime. You have the C# programming language. You have an ecosystem of tooling, primarily in Visual Studio to help developers kind of be productive with .NET, IntelliSense autocomplete, refactoring, all of that fun stuff that helps us write clean, good code. So yeah, runtime, language, ecosystem of tooling, and I think there’s another layer to that stack which is all of the frameworks that are built on top of the .NET ecosystem. And that includes ASP.NET, which is the web framework. Another one that comes to mind is EF Core, which is kind of a data layer framework. So that’s kind of how I think about the .NET ecosystem at large.
[00:39:28] JP: Can you tell us what your role was in this new .NET release?
[00:39:32] SA: Yeah. So as I mentioned, I am part of the ASP.NET Team. We had a lot of fun working on some interesting stuff for this, what we call the .NET 6 release, which shipped the second week of November, I believe. When you’re actually working on a product, release dates and shipping, they’re all a blur in your head because you’re so in the energy of it all. But I worked on a couple of interesting technologies. I think one of the kind of funner ones or the ones that I’m excited to push forward is something called Minimal APIs, which is essentially an approach for simplifying the getting started experience for building web APIs with .NET. Some of the feedback we’ve gotten in the past from people who are trying to get started, building web apps with .NET is that there’s a lot of ceremony involved. And as a new developer or someone just trying to get something out quickly, there’s a lot of stuff you have to set up and learn. And so it really honed in on creating a minimal as the name might suggest, experience, that’s a lot more beginner friendly. A lot of the ceremony has been removed. We’ve tried to introduce sensible defaults and helpful behavior that allows developers to write less code and get the same results. So that was a really interesting project to work on. I think what was great about it is some of the functionality actually spanned multiple teams. So we were leveraging new syntax that had to be changed in the C# language so that we can kind of allow people to write code that was a lot cleaner and simpler. So in addition to kind of being a very interesting problem to solve, which is how do you make a technology approachable to new developers, it also allowed us to collaborate with teams all across that stack that I was kind of describing earlier.
[00:41:21] SY: What are some of the biggest new features and updates in version 6?
[00:44:27] JP: What were some of the biggest challenges you and your team came across while building .NET 6?
[00:44:33] SA: The challenges we experienced don’t differ a lot from what other engineering teams experience on a day-to-day basis, which I think is kind of understanding what the problems that you need to solve are. So for example, I mentioned earlier, it’s all open source. We get feedback from users, primarily via issues on GitHub. The part of it is kind of looking through all of the issues, understanding what the priorities are for different things, what problems are worth solving, what bugs are worth fixing. And that’s, I think, as a developer, is always the delicate balance of knowing not just how to fix something or when to ship it, but what to work on and why to work on it. I think those are always challenging. Sometimes you have a little bit of FOMO, “Oh, should we have actually worked on that feature, fix that bug? Was it more important than we thought?” Or, “Oh, maybe we spent too much time thinking about this particular feature.” It actually turns out it wasn’t important to invest that much time in it because it’s not as important. So I think that delicate balance of figuring out what to focus your energy on has been the most challenging, particularly when rolling out some of the new features.
[00:45:43] SY: Was there anything in this release that you were hoping would be there, but didn’t quite make it?
[00:45:48] SA: More bug fixes.
[00:45:50] SY: Okay.
[00:45:51] SA: Yeah. I think I’ve arrived at that point in my engineering career where I have stopped being interested in shiny new features. So I was like, “Man, if we just fix that bug that’s been in the backlog,” it affects a certain number of people and it’s gotten decent enough priority, but it’s been always in the backlog. If I could fix all of them with one step of my finger, I would. But I think those things are really important because that’s for the quality and the consistency in any product that comes in. And I think it’s a lot easier to ship new features sometimes than to look at the past and all of the bugs that have accumulated and address those. So we should fix more bugs. I really do.
[00:46:35] JP: Can you give us a hint at some of the things that might be coming down the pipeline with future releases of .NET?
[00:46:41] SA: Yes. With minimal APIs, we’re going to continue to improve and polish that experience. I know one of the things that I’m really excited to be working on personally is improving Open API integration with minimal APIs. Open API is a schema/convention for documenting an API. So you can kind of outline what certain endpoints request that they accept or responses they return and we want to kind of improve integration between those two technologies. As for what our plans are overall, one of the things that I think is really cool about working on ASP.NET is it’s open source. So if people are curious to see what we’re working on, you can actually go on our GitHub repo and click issues. And we’ve got a documented what all of our different labels and milestones mean. So if you want to see, “Hey, what are we going to be working on for .NET 7 Preview 1?” You can find all of the items that are labeled that way, kind of follow along with the team as we’re building things. So I think that’s a really interesting and fun aspect of working in open source is you can really follow us along as we build .NET 7 onward.
[00:47:49] SY: Is there anything that we didn’t cover that you want to talk about?
[00:47:52] SA: Yeah. So I think another thing that I would remind everyone listening that ASP.NET is open source and we’re always looking for people to help get started contributing. Definitely one of the things that I personally focused on is identifying ways that we can improve our new contributor experience either by improving our bill documentation, making it easier to use things like GitHub Codespaces, to get started contributing to the repo and a variety of other things. So in addition to following us along as we build the feature, you can actually help build it yourself, which I think is pretty cool. That’s something that I always like to remind people is you can kind of be far of the technology as much as you are a user.
[00:48:35] SY: Well, thank you so much Safia for being here.
[00:48:37] SA: Thank you so much for having me.
[00:48:49] SY: Thank you for listening to DevNews. This show is produced and mixed by Levi Sharpe. Editorial oversight is provided by Peter Frank, Ben Halpern, and Jess Lee. Our theme music is by Dan Powell. If you have any questions or comments, dial into our Google Voice at +1 (929) 500-1513 or email us at [email protected] Please rate and subscribe to this show wherever you get your podcasts.