There just really isn't any magical sauce in the algorithm...
In this episode, we talk about Timnit Gebru’s new research institute, researcher’s contentious relationship with Facebook, and a company that has been secretly helping governments track people’s mobile phones. Then we chat with Dennis Ushakov, Fleet Developer at JetBrains, about Fleet, the company’s new IDE. Finally, we speak with Julian McAuley, computer science professor at the University of California San Diego, about an internal TikTok document the New York Times obtained titled, TikTok Algo 101.
Saron Yitbarek is the founder of Disco, host of the CodeNewbie podcast, and co-host of the base.cs podcast.
Josh Puetz is Principal Software Engineer at Forem.
Dennis started his career at Intel, working on the Apache Harmony project. Then he moved to JetBrains to work on productivity tooling for Ruby and Web technologies, driving the RubyMine and WebStorm projects as a team lead. Now he's focused on Fleet – an IDE of the future, built on a distributed architecture. When he's not working, he enjoys driving cars and open-water swimming.
Julian McAuley is a Professor at UC San Diego, where he works on applications of machine learning to problems involving personalization, and teaches classes on personalized recommendation. He likes bicycling and baroque keyboard.
[00:00:10] SY: Welcome to DevNews, the news show for developers by developers, where we cover the latest in the world of tech. I’m Saron Yitbarek, Founder of Disco.
[00:00:19] JP: And I’m Josh Puetz, Principal Engineer at Forem.
[00:00:22] SY: This week, we’re talking about Timnit Gebru’s new research institute, researchers’ contentious relationship with Facebook, and a company that has been secretly helping governments track people’s mobile phones.
[00:00:34] JP: Then we’ll speak with Dennis Ushakov, Fleet Developer at JetBrains, about Fleet, the company’s new IDE.
[00:00:40] DU: Fleet, as we call the next generation IDE, which claims to be fully remote and also fast and snappy, also support collaborative development.
[00:00:51] SY: Then we speak with Julian McAuley, Computer Science Professor at the University of California, San Diego, about an internal TikTok document the New York Times obtained titled, “TikTok Algo 101”.
[00:01:04] JM: Likewise, you can see on the TikTok document clickbait was a problem. So they have to deliberately down weight stuff that seemed clickbait-y.
[00:01:14] SY: So you might remember back in Season 2 when we cover the termination of Timnit Gebru, a co-leader of Google’s Ethical AI team, who said she was fired after she sent an email criticizing the company’s efforts to hire more minorities, as well as biases in their AI. And we had spoken to Julien Cornebise, an Honorary Associate Professor at University College London and a former researcher with DeepMind, Google’s AI lab, to gain more insight about Dr. Gebru being let go and its potential wider impact on the research world.
[00:01:46] JC: Timnit is a tall poppy. She is someone who really speaks her mind, who has been fantastic. That is a requisite for the kind of work she is doing, who is not afraid to call a spade a spade, and she’s brilliant. Are brilliant people the kind of complacent and quiet employees? No, they’re not. Can a company handle that? You’d assume that Google from its reputation and the claim that it made should be able to handle absolutely brilliant people who speak freely. And to me, that’s really the signal here, more chilling than just what happened to Timnit.
[00:02:26] SY: Well, now exactly one year after Dr. Gebru’s termination, she has launched her own ethical AI research institute called Distributed Artificial Intelligence Research Institute. The organization’s website says that the institute is “A space for independent, community-rooted AI research free from Big Tech’s pervasive influence.” Burn! Love that! Love that! “And the harms that are embedded in AI are preventable and it aims to create a more positive vision for AI while being independent from “structures and systems” that incentivize profit over ethics and individual wellbeing.” The Institute received $3.7 million in funding from the MacArthur… Yeah. Right? From the MacArthur Foundation, Ford Foundation, Kapor Center, Open Society Foundation, and the Rockefeller Foundation. Those are some big names. Those are some like heavy hitters backing this, which I think is really awesome. So kudos to her.
[00:03:29] JP: And that heavy hitter is in the tech industry as well.
[00:03:31] SY: Yeah. Yeah, exactly. This is amazing. Now this brings me to another bit of news I wanted to talk about, which is an investigation that was done by the Financial Times that talks about the many roadblocks researchers have said that they faced while trying to do research with Facebook and their parent company, Meta. We’ll include a link to the investigation in our show notes. The piece opens with a story about how one researcher withdrew his application to research political campaigning on Facebook because the company’s contract would allow the company to review and extract anything they deemed as confidential information or personal data, but the company then failed to clarify what constituted confidential information. Some of the researchers interviewed for the story compare the company’s systematic efforts to block research to Big Tobacco, ooh, that’s pretty harsh, saying that the big tech company is, “Setting up research institutes and commissioning research that isn’t really research.” Facebook seems to have tried to get ahead of some of this criticism by launching a tool called the Researcher API from Facebook’s Open Research and Transparency team, but the tool is only available to two dozen undisclosed research institutions chosen by Meta, which just like reeks of just…
[00:04:51] JP: Right. Sadly, with that.
[00:04:52] SY: Like what? Only 24 undisclosed and chosen by the government.
[00:04:57] JP: Trust us. It’s fine.
[00:04:58] SY: Exactly. This is totally open and transparent. This all of course comes on the heels of the whistleblowing and massive Facebook document dump, a former Facebook employee, Frances Haugen, who testified before Congress about some of the problematic inner workings of Facebook.
[00:05:14] FH: I believe Facebook’s products harm children, stoke division, and weaken our democracy. The company’s leadership knows how to make Facebook and Instagram safer, but won’t make the necessary changes because they have put their astronomical profits before people. Congressional action is needed. They won’t solve this crisis without your help.
[00:05:36] SY: Ultimately, it’s becoming increasingly apparent that there is a crisis and conflict between research institutions and big tech across the board.
[00:05:45] JP: I mean, it’s not like Facebook would set up an organization for monitoring or oversight and then completely ignore what they say, right?
[00:05:54] SY: No! That's not on brand. That doesn’t track. That doesn’t track. Well, first of all, huge congrats to Dr. Gebru. I think that is awesome. I love that she launched it exactly one year after the firing.
[00:06:07] JP: That cannot be a coincidence.
[00:06:09] SY: That is just perfect. That’s brilliant. Also, I mean, I don’t know what it takes to raise like millions of dollars for a research institute.
[00:06:16] JP: Right.
[00:06:16] SY: That feels like a lot to get done in a year.
[00:06:18] JP: It feels like it’s hard to raise billions of dollars for a for-profit company nowadays, right?
[00:06:22] SY: Yeah, just in general, just in general. Yeah.
[00:06:23] JP: Right, in general. And now the idea behind this research firm is basically, “We’re going to research, but we’re not aiming towards profitability,” that’s a completely different realm for fundraising, I would imagine.
[00:06:35] SY: Yeah. And you have to think about just her life in the last year. I’m sure she’s dealt with tons of harassment and she’s been on a magazine cover, articles written about her, interviews she’s done. Like she’s had so much to deal with, with just the aftermath of the firing. And then on top of that, she built a whole institution. That’s just amazing. So I'm super impressed.
[00:06:55] JP: Yeah. She acknowledges they’re seeking a funding from companies that are not necessarily in the tech sphere because they want to get out from underneath the thumb of our money comes from the very industries that we’re researching, but she’s pragmatic about it. She says, “It’s a hard road to take because ultimately we are asking for money and contributions to produce research that will probably limit capitalistic endeavor.” And that’s tough.
[00:07:29] SY: Really excited to see what they come up with, who they partner with, what projects they release, what research they share with the public. They’re really excited to see where she takes her new institution.
[00:07:39] JP: Yeah. On the flip side, what do you make of Facebook’s? I’m going to put research in air quotes. There’s a real difference of opinion here about what research means. Isn’t there?
[00:07:51] SY: Yeah. I mean, there’s already a conflict of just independent research versus research that kind of goes through the company. And then on top of that, restricting and reviewing and only allowing an undisclosed number, like the whole thing is just so... it just goes against the whole point of research, right? And it goes against this idea of like open and transparent. And it’s not surprising. Unfortunately, it is not surprising. I didn’t sit here and read that and go, “Oh, man, what? How could they?”
[00:08:22] JP: “How could you?”
[00:08:24] SY: It feels very on-brand with what they’re doing where they are trying to save face. They’re trying to make it look like they’re doing the right thing, and it also comes back to even if they did do research the right way, so what? Then what happens? They researched. They find that Facebook is destroying society. As you know, we’ve already talked about and has already been discussed and shown. And then what happens? Are they really going to act on it? Are they going to change on it? You know what I mean? It’s not like they don’t know some of the harmful effects of Facebook. Nothing’s really happened. So it’s almost like doesn’t even matter that they’re doing this.
[00:09:06] JP: It seems like companies like Meta view research as more of a marketing activity and not a way to gather information or actually improve their product.
[00:09:15] SY: Yes, a hundred percent. It feels like a way to maybe get government off its back. Maybe get ahead of legislation. Maybe kind of appease the government, appease politicians. But yeah, it feels much more like a strategic marketing plan than it does genuine interest in improving their product, a hundred percent.
[00:09:48] JP: Well. Speaking of companies keeping things secret, a Bloomberg investigation says that the Swiss company by the name of Mitto AG, which works with a bunch of companies, including Google, TikTok, and LinkedIn to text their users things like passwords and pins, has allegedly been helping governments secretly track and surveil people through an opaque arm of the company. Allegedly, the firm’s co-founder and chief operating officer, Ilja Gorelik, operated a surveillance service within Mitto AG that used the firm’s access to locate individuals via their cellphones. According to four former Mitto employees, only a handful of people knew of the company’s surveillance operation, and it wasn’t shared with any of its clients. And the firm also kept Gorelik’s association with the surveillance industry a secret. Mitto, which was launched in 2013, denies the allegations stating that, “To be clear, Mitto does not, has not, and will not organize and operate a separate business division or entity that provides surveillance companies access to telecom infrastructure to secretly locate people via their mobile phones or other illegal acts. Mitto also does not condone support or enable the exploitation of telecom networks with whom the company partners with to deliver service to its global customers.” The investigation says the company worked with clients to install custom software, which could track mobile phone locations as well as retrieved call logs. One person who was known to be targeted by Mitto software was a US State Department official in 2019.
[00:11:19] SY: Yikes!
[00:11:20] JP: This is like the ultimate conflict of interest. Have you ever heard of a company secretly operating within another company?
[00:11:27] SY: I don’t know. I mean, I guess that’s the skunkworks, right? Like that’s kind of the definition of that?
[00:11:31] JP: It’s very espionage-y.
[00:11:33] SY: Very espionage-y. Yeah. I didn’t think that like actually happened anymore. I thought that was kind of like, I don’t know, an old thing or maybe something you see in the movies, but not something that actually happens.
[00:11:46] JP: Yeah. I’m curious what the differentiation is between if someone just breaking the law and illegally doing this versus if you were operating a company within another company. I guess when the perpetrator is the COO of the company, that’s why it’s there.
[00:12:00] SY: That’s what I was going to say. Yeah, exactly. It’s like if it was some rogue set of employees, maybe some manager looking to get a bonus or something, okay, that’s one thing. But when it’s the firm’s co-founder and chief operating officer. And it also makes me wonder. So he had a company like this. Like did the board know? Did the advisors know the president? Who up there was privy to this information? Was it just this one row co-founder and his little team of people doing surveillance or did it go up higher than that? I guess that’s what they’re investigating. Right? Trying to figure that out.
[00:12:34] JP: I mean, there are a lot of concerning parts about this story, honestly. But the part I found really interesting and we’ll have links to the article in the show notes, but it’s really interesting, the technical details of how these systems work. They’re based on really old phone technologies, and state organizations like the State Department, US government, other governments, they’re outsourcing all of this work to third-party companies. Ultimately, you could have something like the State Department, which we’ve heard stories about how secure their networks are and how everybody has to have a super lockdown phone. But at the end of the day, we’ve heard from security experts, your security is only as strong as the weakest link. And if you’re outsourcing the last mile delivery of two-factor authentication codes over a text message, that’s a significant problem.
[00:13:27] SY: Yeah. And it also makes me wonder, in this case, they were actively surveilling, right? They were installing custom software. They were tracking. It was a concerted effort. But it also kind of makes me wonder, how far away are regular engineers from being able to do that kind of thing? You know what I mean? I don’t know if you read “An Ugly Truth”. It was kind of the story of Facebook over the years and all their different scandals and privacy issues and all that. And there are a number of incidents of engineers who track ex-girlfriends or people they went on dates with or just people that they’re interested in. And they have all this access to their GPS location, obviously everything they publicly post, but also things they privately share, photos, deleted photos. They have all this information and they have complete access, at least at the time this book was published, which I think it was earlier this year, to all this information that would make it very easy to surveil their users and very easy to track unknowing customers and users. So it’s interesting that this company went all the way in creating a whole operation around it and creating it kind of as a service, as a separate business, but also how hard would it be for other companies, especially social networking companies to do something like this? Maybe not on an official business capacity, but just on a “I wonder why that girl didn’t text me back” capacity.
[00:14:58] JP: Yeah. That’s a terrifying thought because I think about every place I’ve ever worked, people have access to production databases. And depending on what kind of place you work at, it might not be the most thrilling information in the world, but there’s always private information in there. And if it can happen at a security firm that is, in theory, being reviewed by different governments for their security clearances and how well they approach security, if this can happen at that kind of a firm, I mean, when I buy something online, it could easily happen there. It could easily happen on social networks or communication platforms that aren’t government regulated or aren’t being used by heads of state, but are just being used by me. So that’s terrifying. Thanks for that.
[00:15:47] SY: You're very welcome. Coming up next, we speak with one of the developers of JetBrains’ new IDE, Fleet, after this.
[00:16:14] SY: Here with us is Dennis Ushakov, Fleet Developer at JetBrains. Thank you so much for joining us.
[00:16:20] DU: Hi.
[00:16:20] SY: So tell us about your developer background.
[00:16:23] DU: I’m working at JetBrains for about 13 years. I used to work on RubyMine for quite a while then I was a WebStorm team lead. About a year and a half ago, I’ve switched to the Fleet team.
[00:16:38] JP: Can you tell us a little bit about JetBrains, the company, what it does and what your role is there?
[00:16:44] DU: JetBrains does the tools for software developers and creative professionals. We have a pretty big portfolio of the solutions, like for example, Space, which is an integrated team environment. Also, we have lots of IDE. Also, we have a continuous integration server, our tracking system. So yeah. My current role is a developer at Fleet. I mostly work on its integration and spend some time adding some stuff to the editor.
[00:17:19] SY: So Fleet is the new IDE that you’ve come out with that is currently in preview. Can you tell us a little more about it?
[00:17:26] DU: Fleet, as we call the next generation IDE, which claims to be fully remote and also fast and snappy, also support collaborative development, which is quite popular these days.
[00:17:39] JP: You mentioned that you primarily work on integrations and sometimes the editor in Fleet. Can you tell us a little bit about the nature of that work? Specifically, on a day-to-day basis, what kind of things are you implementing in Fleet?
[00:17:53] DU: Sure. For example, if you want to have a support in your IDE, you expect it to do some at least basic things, like being able to see what you’ve changed compared to the version that is checked in, like new files or change lines, you would like to be able to commit to new files. And if we are talking about Fleet, you would like to stage and then stage them. Also, you want to see the history of the repo or a specific file and you would probably like to see the blame, like who changed that file, and when things like seeing what commits are incoming onto your broad range or outgoing. So that’s the thing I’m adding to the Git integration, and that’s the things that somewhat working in the preview.
[00:18:52] SY: Somewhat working is a good start for sure. So what tools and languages does your team use to actually make Fleet? What’s it built in?
[00:19:00] DU: It’s built in mostly Kotlin.
[00:19:02] JP: Oh, wow!
[00:19:02] DU: So we have our custom UI framework, which is kind of similar to the Jetpack Compose. Unfortunately, Jetpack Compose wasn’t there when we started doing Fleet. So we have to invent something for UI. Also, we have a small binary within Rust that’s used for native operations.
[00:19:24] JP: So one thing that struck me following the discourse on Twitter about Fleet was that someone from JetBrains responding to a question that was asked if Fleet was made in Electron or based on VS Code at all, and they said that it was not. And I just wanted to ask you about that. Is that true that Fleet isn’t based on Electron? It sounds like it’s not, if it’s written in Kotlin. And why is that? Why do you think so many other IDEs are using Electron and why doesn’t Fleet?
[00:19:52] DU: Yeah, it’s not based on Electron. So we also use here is our rendering framework, but the UI framework is custom. So why not Electron? Because I think everyone is kind of feeling that Electron-based solutions are perceived as slow and they like to consume a lot of memory. So we wanted to avoid that.
[00:20:18] SY: So what makes Fleet unique from the other IDEs out there?
[00:20:23] DU: I think what makes Fleet unique is our architecture. So if we are talking about an IDE, usually you have your first code and you have your editor and you have your code processing engine and you have your running application at the same machine. That kind of used to work back then, but right now, we have, for example, applications with a lot of microservices or some application that’s not easy to launch on your laptop or your development machine. When you come into a new company, you always spend some time like, “I need to check out this repo. I need to build and install that service and that other service.” And it feels like that’s not the ideal experience. So Fleet is fully distributed. So you can have a separate component running on different machines. So for example, you can have a front end at your local machine where you actually see the source code, but the source code might be located somewhere in a different place, and not necessarily you can have a code processing engine there or even on another busy machine. So the Fleet is distributed. So you can have different combinations of services running on the different machines. Of course, we still support the local scenario when you have everything on your local machine as well.
[00:21:54] JP: So do you think that ability to have your workspace locally or in the cloud, do you think that is a big feature that will attract developers? What do you think is one of the more key features of Fleet that will attract developers?
[00:22:10] DU: I think that’s not the only feature that makes this attractive. We feel that Fleet looks attractive by itself. We’re actually quite happy with the UI that we have. Of course, it needs some polishing, but it looks good. Also, Fleet is quite fast. Even if you have a big project and you have IntelliJ running on your local machine, it’s still fast and you can type in the editor faster. It doesn’t become slow because of the indexing or stuff like that.
[00:22:41] JP: Is there anything in this preview release that you’re hoping will be there by the time it’s time for full release? Is there anything that you’ve got your eye on that you hope makes it in?
[00:22:53] DU: I hope that all the languages from IntelliJ platform will make it into the final release. So we would like Fleet to be really a polyglot. So you don’t have to switch those when you are working on different projects with different mix of languages.
[00:23:09] SY: And what are some things that might be included in this IDE in the future?
[00:23:14] DU: Right now, we don’t have many refactorings, just a couple of them. So that’s the thing that we would like to include. We also would like to include better scenarios, if you would like to set up your cloud development yourself. Right now, you can have a one-click solution if you use JetBrains Space, but some additional setup is needed for other scenarios.
[00:23:39] JP: What are some of the biggest challenges that you and your team encountered while you were building Fleet?
[00:23:46] DU: I think for quite a few of us, having a new UI framework, it was quite a big challenge.
[00:23:53] JP: Was that just because it represented something brand new you had to get into?
[00:23:58] DU: Yeah, we had to switch our thinking from the Swing conceptions to the reactive ones.
[00:24:04] SY: Is there anything that we didn’t cover yet that you want to talk about?
[00:24:08] DU: We can talk about our current state of affairs. We’ve got about 130K of submissions for our preview program.
[00:24:18] SY: Nice!
[00:24:19] DU: It does make us pretty happy. And what’s especially cool is that around 20K of them are new to JetBrains. So they don’t use other products from JetBrains, which makes us especially happy.
[00:24:34] SY: That’s really cool. Well, thank you so much for being here.
[00:24:37] DU: Thank you.
[00:24:44] SY: Coming up next, we talk about social media algorithms and specifically about a TikTok internal document that was obtained by the New York Times, which goes into detail about how their algorithm works after this.
[00:25:05] SY: Joining us is Julian McAuley, Computer Science Professor at the University of California, San Diego. Thank you so much for being here.
[00:25:12] JM: Yeah. Thanks.
[00:25:14] SY: So Julian, tell us about your computer science background.
[00:25:17] JM: I’m a professor of machine learning in the Computer Science Department at UCSD. Most of the work I do is on what I kind of call personalized machine learning, which is about how you sort of adapt to machine learning algorithms to settings where there’s a lot of variability among users. So a classic example of this is something like recommender systems. So we want to suggest a movie on Netflix or a product on Amazon, you have to account for things like people’s different backgrounds and different personal preferences, their contexts, so on and so forth. So obviously people like to interact with different things. You need to build predictive algorithms that account for those differences among individuals. But yeah, outside of recommender systems, this kind of research has lots of implications to things like personalized healthcare, natural language processing, building dialogue agents, and all sorts of interactive systems. So yeah, basically taking all sorts of ideas from machine learning and applying them in settings where we need to understand differences among people.
[00:26:13] JP: You were recently quoted in a New York Times article entitled, “How TikTok Reads Your Mind.” Can you talk a little bit about what that piece was about and your contribution to it?
[00:26:22] JM: Yeah. Funnily enough, a lot of people have been asking me about the TikTok algorithm over the last few months. I’m not really a social guy and given how boring some of my responses that they’re just like, “Oh, yeah, the algorithm is kind of normal.” I'm sort of surprised that people keep asking me about it. Yeah, New York Times I guess had some kind of big title document describing the underlying structure of TikTok’s recommendation algorithm. At least anecdotally, it seems that TikTok is a platform that is very addictive. It’s not exactly clear where that addiction comes from. I think part of it comes from the fact that young people actually like watching these short videos for some reason. I suspect it has more to do with that and has less to do with any nefarious or malign properties of the recommendation algorithm. Of course, on the other hand, it is not always the case that these recommendation algorithms do try to optimize metrics like engagement or session length or watch time or things like that, which can indirectly lead to addictive outcomes. But yeah, I don’t think there’s anything so different about how this recommendation algorithm appears to work compared to what I would expect is industry standard.
[00:27:35] SY: So by the title, the author makes it seem, as you said yourself, that the TikTok algorithm is this super-secret sauce, is really sophisticated, crazy thing that makes TikTok so addictive. And you said that the recommendation engine in the article is, “Totally reasonable, but traditional stuff.” Can you talk about what some of this traditional stuff is that you mentioned in terms of these algorithms?
[00:28:00] JM: I guess there’s sort of two parts to a recommendation algorithm, or at least to this kind of recommendation algorithm. One is sort of the machine learning part where we’re actually making predictions about. So what you’ll click on, will you watch this video to the end? Will you click on an ad? You like this product? This is really the job of machine learning, to take tons and tons of historical interaction data and use that to build some kind of black box, which estimates the probability. It says, “If I expose this user to this stimulus, how will that user react? Will they click on this thing? If they click on it, will they watch till the end?” So on and so forth. These are the kind of what’s called I think key in that article. So it’s like an estimated probability of a click. Then the second part of a recommendation algorithm is kind of what do you do with that? What do you do with this black box? Given that I can estimate pretty accurately what you’ll click on, do I show you a bunch of things that maximize the click probability? Do I inject some diversity into those recommendations? Do I try and down weight things that are too click bait-y, which might mean you’ll click on them, but you’re very unlikely to actually enjoy or consume the content all the way through? So there’s all these complicated moving parts, which is not really the machine learning algorithm exactly. Those are like design choices to make the machine learning algorithm produce something satisfactory. So this recommendation algorithm, it consists of all of these interacting moving parts and people are attaching different weights to them, and the content creators, there may be also a little bit adversarial, they want their content to be promoted to the top. So maybe they’ll make things very clickbait-y and we have to down weight clickbait a little bit and we manually choose all these knobs. That’s kind of a rough characterization of how any large-scale industrial recommender is going to work. You’re going to have this machine learning component which estimates these sort of immediate responses that users will have, and you have this design or like systems engineering component who says, “What do we actually do with this thing? How do we build something out of it?” And I think this document fits into that mold pretty well. I mean, you say that, yeah, TikTok, they do use things like estimating what you’ll click on, estimating if you’re watching to the end, how long is your session going to be. But that’s what anyone would be using. Right? And they do things like make sure that not too much of the content is clickbait. So they’re thinking about those basic things. If someone told you Spotify and was doing this, if Spotify was predicting which songs you listened to and moreover predicting which songs you’ll listen to the end or something, it’s more satisfactory to listen to the entire song, rather than if you get halfway through. I think nobody would accuse Spotify of optimizing for addiction, right? Maybe TikTok people are a little bit more sensitive that the content might be addictive.
[00:30:57] JP: You had mentioned that everybody’s been asking your opinion of TikTok’s algorithm lately. Do you feel like the New York Times piece might have overblown a little bit what this document about TikTok’s algorithm actually reveals? And the other question I wanted to ask was just broadly, why do you think we’re also interested in how these algorithms work?
[00:31:19] JM: Well, yeah. I would say regarding the New York Times basically, the document, it didn’t appear to me to be that much of a hot potato. To me, yes, it seemed like standard stuff, but I can totally see how to somebody who’s never seen such a document or who has less familiarity with how recommendation engines work, it can be a fascinating topic to see what all of the pieces that go into it. And leading into your second question, I think to a layperson or to a user of a recommender system, they do seem kind of magical sometimes, or maybe creepy is the better word. They seem to know more about the user than you would expect them to. That’s the way that, at least as the New York Times article have quoted, some people perceive TikTok that way. It seems to know so much about its users, and people get creeped out by that and they don’t know how they know so much. They feel that there’s some kind of nefarious or extreme harvesting of personal data. I think to someone more familiar with the algorithms, it can be the case because the opposite is true that the algorithms are fairly simple and they work well and they seem magical just because there are so many users and there’s so much data. They get recommendations just right for you because they have a billion users. And at that point, there are going to be all the users who are very similar to you or it’s going to be a combination of users who’s behaving collectively is very similar to yours or maybe otherwise that you're just not as unique as you might’ve imagined you are. So the algorithm can work fantastically well, but it’s not necessarily some big algorithm complexity. It’s more about the volumes of data and discovering patterns among huge numbers of people.
[00:32:55] SY: So that makes complete sense to me, this idea that ultimately you can have a really fancy algorithm, but it’s really the data collection and being able to go through the huge amounts of data and have access to all that information that really makes the magic happen. But it feels like TikTok’s popularity and addictiveness has been there really from the beginning before it blew up and became this worldwide success and phenomenon. And I’m wondering, how do you think a social media app like this is able to become so addictive before it gets that big without having all that data at the start?
[00:33:28] JM: Sure. Well, I’m not a psychologist by any means, but I think to give a boring response I think part of it is the nature of the content. So it’s probably very difficult to make an extremely addictive recommendation platform. If you’re recommending to our documentaries to people, that would be difficult to do. Whereas this very short form, chaotic content seems to have struck a chord that people just find very addictive. I don’t know much about the psychology of addiction or anything, but it seems to have exactly the kinds of pieces that make people just want to keep consuming more content. It maybe matches the attention span of people these days or something. I don’t know about that.
[00:34:14] SY: So you think it’s really not even the algorithm, it’s just almost the editorial decision, right? At the very beginning of what type of content they think is going to work, what type of videos are going to be addictive, kind of starting from there and then adding the algorithm on top of that, is that kind of the way you’re thinking about it?
[00:34:30] JM: Yeah, perhaps. I mean, the algorithm is important, but having excellent content or very addictive content otherwise is probably a more critical piece than the algorithmic choice. I mean, even if they just recommend content that’s the most opulent, that would be like the totally naïve, most trivial algorithm you could have. If the content is good, if it’s addictive, if it’s engaging, that will result in a platform that’s highly addictive, even without some complex recommendation magic. I shouldn’t say it doesn’t take long for there to be a lot of value. You might have a billion users right away, but this type of platform, as users who are very engaged, who are providing feedback constantly, you’ll learn a lot about what kind of content people interact with very quickly. Within sessions of a few minutes long, you can kind of tell what users are engaging with, what users are not engaging with, and quickly discover patterns that help you to make effective recommendations.
[00:35:31] JP: So the media that focuses a lot on the algorithms that the social media companies are using, case in point, the New York Times piece said that they received the TikTok document from a person who shared it because they were disturbed that the app was pushing towards what they said was a sad content, and they were really concerned that that could lead to self-harm. I'm wondering, do you think we’re seeing the result of product decisions and business decisions and that the algorithm is just growing out of those, that it’s being directed by what the business wants? Do you think it’s possible to use an algorithm in a neutral manner? Or is it always going to reflect what the end goals of the person behind it are?
[00:36:21] JM: That’s a complicated question.
[00:36:22] JP: It was, I apologize.
[00:36:25] JM: But with respect to sad content, at least, I mean, you can look at the document. It’s not like there’s a prediction of how sad the content is and there’s a knob that they’ve dialed up to 11 to make things sad. I mean, there’s totally no deliberate tuning towards this type of content. That being said, it’s easy for an algorithm to accidentally lead to some negative outcome. And I think a lot of the bad outcomes we’ve seen from recommender systems lately are more like accidents, rather than a deliberate misuse of the algorithm. If you think about things like polarization on Facebook, for instance, or the tendency of YouTube to recommend maybe more extreme videos than the ones you are consuming. So if you like slightly right-leaning videos, they might recommend you more extreme right-leaning videos. Well, I don’t think those are deliberate design choices by the system’s engineers at Facebook or YouTube. I think those things are more like, well, these are undesirable outcomes that can be caused by the feedback loops that exist within recommender systems rather than the result of deliberate decisions by algorithm designers. I mean to get back closer to your questions, there’s more things you can reactively correct for, if you find that your recommender system is recommending too extreme content, you could potentially turn it down. Or if you find that people are being exposed to extremely polarized articles, well, you can change what they’re exposed to. Likewise, you can see on the TikTok document clickbait was a problem. So they have to deliberately down weight stuff that seemed clickbait-y. I don’t think the algorithm is just reproducing what the designers want. Mostly I think the bad outcomes of recommendation happened more by accident rather than by design.
[00:38:16] SY: So in that situation then, this kind of goes to the Facebook papers, the documents that were released a little while ago and kind of the bigger question of whose fault is it, who’s in charge, who’s responsible. And in this world, what is your opinion on that? Who do you kind of hold accountable for these unintended consequences? Is it the developers? Is it the company? Is it just nobody and we just kind of live with it? How do you make sense of that?
[00:38:44] JM: Oh yeah, that’s a complicated question too. I mean, I would certainly hold Facebook a little bit accountable because it’s a problem, even if they didn’t deliberately cause it, if it was an unintended result of a recommendation algorithm, it’s still a problem they know about and it’s a problem they could correct. I would say also that these problems are not intractable. They can be corrected. You can build a recommendation algorithm that does expose people to diverse views or does lead people away from extreme content or does avoid sad or potentially harmful content. There’s not like algorithmically difficult corrections to make. I think the challenge is getting anyone to want to do it when perhaps it runs counter to the more immediate goals of keeping users engaged.
[00:39:36] SY: Right.
[00:39:37] JM: So they could do something about it. They’re opting not to. I mean, perhaps they’re acting out of their objective self-interest, and it’s very hard to expect not to.
[00:39:47] JP: Is there anything in the TikTok document that surprised you?
[00:39:52] JM: No, I don’t know. Maybe I was not very attuned to be surprised. I think given the there’d been all these bells around it, I was perhaps surprised it is kind of so traditional. They hadn’t released in the document. They hadn’t seemed to have cracked some magic code to make things work fantastically. It was fairly traditional stuff. And yet, as they say they have this extremely effective platform that has been so addictive to users.
[00:40:17] JP: So you were basically surprised that you weren’t more surprised?
[00:40:19] SY: Yeah. Well, thank you so much for joining us.
[00:40:23] JM: Yeah. Cheers! My pleasure.
[00:40:35] SY: Thank you for listening to DevNews. This show is produced and mixed by Levi Sharpe. Editorial oversight is provided by Peter Frank, Ben Halpern, and Jess Lee. Our theme music is by Dan Powell. If you have any questions or comments, dial into our Google Voice at +1 (929) 500-1513 or email us at [email protected] Please rate and subscribe to this show wherever you get your podcasts.
[00:41:07] JP: We need T-shirts to say, “After this.”
[00:41:11] SY: That’d be amazing.
[00:41:12] JP: Wouldn’t that be awesome?
[00:41:12] SY: I want a mug. I want a holiday gift, a holiday gift from Forem. I want a mug that says, “After this.”
[00:41:19] JP: DevNews logo on one side, after this on the other.
[00:41:21] SY: Yes.
[00:41:22] JP: Yes.
[00:41:22] SY: That’s what I want.
[00:41:23] JP: Okay.
[00:41:23] SY: Levi, you got to make this happen.