Season 5 Episode 5 Aug 19, 2021

DeepMind’s XLand, Android 12 Beta's Camera Switches, a Colorism Issue With Face Filters, and a Senior’s Robot Companion

Pitch

A playground for machine learning!

Description

In this episode, we talk about social media face filters perpetuating colorism, and about a new companion robot for the elderly. Then we talk about DeepMind’s new exciting AI training tool, XLand, with Max Jaderberg, senior staff research scientist at DeepMind. And then we speak with Suzanne Aitchison, software engineer and accessibility specialist here at Forem, about Android 12 beta’s “Camera Switches,” which lets users control their phone with facial expressions.

Hosts

Saron Yitbarek

Disco - Founder

Saron Yitbarek is the founder of Disco, host of the CodeNewbie podcast, and co-host of the base.cs podcast.

Josh Puetz

Forem - Principal Engineer

Josh Puetz is Principal Software Engineer at Forem.

Guests

Max Jaderberg

DeepMind - Senior Staff Research Scientist

Max Jaderberg is a researcher at DeepMind leading the Open-Ended Learning team, driving the intersection of deep learning, reinforcement learning, and multi-agent systems. His recent work includes creating the first agent to beat human professionals at StarCraft II, and creating algorithms for training teams of agents to play with humans in first-person video games.

Suzanne Aitchison

Forem - Software Engineer

Suzanne Aitchison is a software engineer at Forem. She is passionate about accessibility and maintains a tutorials site for accessible web development (particularly with reference to React).

Show Notes

Audio file size

43237607

Duration

00:45:02

Transcript

[00:00:10] SY: Welcome to DevNews, the news show for developers by developers, where we cover the latest in the world of tech. I’m Saron Yitbarek, Founder of Disco.

 

[00:00:19] JP: And I’m Josh Puetz, Principal Engineer at Forem.

 

[00:00:21] SY: This week, we’re talking about social media face filters perpetuating colorism and about a new companion robot for the elderly.

 

[00:00:29] JP: Then we’ll talk about DeepMind’s exciting new AI training tool, XLand, with Max Jaderberg, Senior Staff Research Scientist at DeepMind.

 

[00:00:37] MJ: By trying to train on all of these different games, rather than just one or two, you’ll get very general behavior out agents, which are able to play any sort of game, which even it hasn’t encountered during training.

 

[00:00:47] SY: And then we speak with Suzanne Aitchison, Software Engineer and Accessibility Specialist here at Forem, about Android 12 beta’s camera switches, which let users control their phone with facial expressions.

 

[00:00:59] SA: So this is all like very customizable. So maybe I’ve set it so that when I look down, it’s going to scroll down for me. Or maybe when I look right, it’s going to go to the next interactive item.

 

[00:01:11] SY: So we’re starting this episode out by talking about a great piece in the MIT Technology Review titled, “How digital beauty filters perpetuate colorism.” So the piece goes into some of the history of colorism, which you can even see today in the ways in which beauty brands will often try to make their POC models look whiter or mixed race by adding things like freckles, contacts, and contouring in ways that make their noses look smaller and slimmer and have more stereotypically white features. And now we’ve seen the same types of colorism morphing through social media filters that can convincingly alter the look of someone’s face by leveraging really powerful facial recognition algorithms. Now the perpetuation of these limited beauty standards, like small noses, big eyes, lighter skin, fuller lips is even easier. The piece also goes on to say that according to new research, beauty ideals are narrowing faster than ever, especially among young girls. Ultimately, we have yet again another use of AI, which instead of somehow moving us to some techno utopia, instead drills down on something problematic, but increases the speed and breadth of these problems. It makes the future look pretty grim for brown girls who are now having the idea that they aren’t beautiful in their own right constantly reinforced. So Josh, as a white man, what do you think about this?

 

[00:02:36] JP: Let me give you my opinion. Okay. So Clueless White Guy 101, I had no idea. Okay. I kind of knew colorism was a thing. I had no idea about beauty brands giving models, freckles, lighting their skin.

 

[00:02:52] SY: Yeah, to be fair, black people can have freckles, but much less frequently. When I think of colorism, I think specifically just skin color. So it was interesting to think about eyes and shape and all these other things. One thing that’s kind of always bothered me about, I watch, not a lot, but a decent amount of beauty vloggers and beauty influencers, that sort of thing. And the foundation color that they choose is always like pretty significantly lighter than their actual skin color.

 

[00:03:24] JP: Whoa!

 

[00:03:25] SY: It really bothers me.

 

[00:03:26] JP: No. That would bother me too.

 

[00:03:28] SY: Yeah. Look, I don’t know if they understand that it is or they just think they’re doing a good job matching it. I don’t know where it comes from. I don’t know if these beauty influencers are going, “Ooh! Here’s my opportunity to be whiter today.” You know what I mean? I don’t think it’s that explicit, but I do notice. There’s definitely a few influencers who do their concealer, they do their highlights and you kind of compare the before and the after. And the after, you’re like at least a shade or two lighter than your natural skin complexion.

 

[00:03:55] JP: Whoa!

 

[00:03:56] SY: And I don’t know, it makes me a little upset. I don’t like that. I want you to fit what you actually are and wear that proudly. You know?

 

[00:04:04] JP: Yeah. From what I understand about concealer, and full disclosure, I don’t wear makeup. I have a 13-year-old daughter who does wear makeup and is super into beauty TikTok and beauty YouTube. She plays around with makeup and coloring her face and doing her eyeshadow. And do not tweet me about appropriate ages for makeup for kids. Thank you. Because she’s really good at it. She looks amazing and she’s so proud of the looks that she pulls off, but she has complained to me about concealer and foundation, just in the terms of like she’ll be in the store and she’s like, “Oh, I didn’t write down what my color is.” She’ll be like comparing it to the back of her hand. And she’s like, “I want to pick one that looks like my skin color.” So the idea that however subtly an influencer would be going a shade or two lighter, this is how it happens, right? It’s a thousand little cues that you pick up in social media that your face isn’t quite the right shape, your skin isn’t quite the right shade, your bone structure isn’t quite right. I think the surprising thing to me about this article, it was pointing out that like, “Hey, facial filters, you could put dog ears on your face and give yourself a cute little bunny nose. Or you could start to layer these historically really biased notions of beauty on top of your face,” and that’s super problematic.

 

[00:05:28] SY: I’m really interested to see how beauty filters and the beauty industry intersect. Because one thing we’ve definitely seen in the last, let’s say three to four years in the beauty industry, is more inclusive makeup. As much as their influencers who are I think picking shades that don’t quite match, there are definitely way more shades to pick from, and there are darker skin tones and there are a wide range of shades that you can pick. So if you want to find your exact match, you can do it much more easily today than you could just three years ago. There’s been a lot of pressure on beauty brands to be more inclusive, specifically in terms of matching the right color. So I feel like the beauty industry is generally moving in, I don’t know if it’s the right direction but in a better direction, but filters are not. Filters are kind of stuck in the old ways. And so I’m wondering as beauty hopefully continues to be more inclusive and continues to allow people or to encourage people to be proud of the way they already look, will beauty filters catch up to that? And that’s kind of the other part of it as well is what is the role of the developer? As a developer, if I know that women want slimmer noses and want brighter skin, let’s call it that, brighter skin, do I go, “No, I disagree with you,” and risk potentially losing those users or having them go somewhere else? Is it kind of my place to deny certain features because I personally don’t believe in it or I think that it is maybe not explicitly harmful, but definitely, like you said, paper cuts, definitely adding and perpetuating certain things? Or do I let the users decide? We’re supposed to listen to our users and we’re supposed to build what the customers love. So how do you kind of balance that as a developer? How do you make the decision of, “I don’t want to perpetuate certain norms, but also I want to make sure I’m building something people actually want”?

 

[00:07:31] JP: I was thinking about this in the context of misinformation and notifications that social networks are trying to give users about COVID vaccine misinformation, political election misinformation, and I wonder if there’s a place, I actually would argue, there is a place for these photo apps and social media apps to start giving users information to say like, “Hey, this filter changes your look. Know that this could affect your self-esteem. If you are feeling bad about yourself, here are resources,” and maybe that’s not the best…

 

[00:08:08] SY: Very interesting.

 

[00:08:09] JP: Maybe giving someone a link dump to like feeling better about themselves isn’t like the best thing, but I almost feel like we put warnings on so many things in the physical world now that we know can hurt people physically. And we’re starting to do it with, you know, using this app too much could impact your attention. It could impact your wellbeing. Digital wellbeing is a hot topic. I think it could be expanded to ideas of self-image and your pictures, especially in apps where you see some people’s social media profiles, and it’s all just selfies. And some of them are very proud of their image and others are posting a lot of images, trying to feel better about themselves through filters, through affirmation, from their friends. And I wonder if there’s a place in these networks for educating users to let them know, “Hey! This could like that be helpful for you. You might be experiencing some mental duress or you might be at a bad place because of the filters that you’re using.”

 

[00:09:10] SY: That is so interesting. It’s like content warning, but for social media posting.

 

[00:09:18] JP: Right. Right. For the content you’re putting out there. Right?

 

[00:09:21] SY: Yeah, yeah, yeah. And the tools you’re using to put that out. That is a fascinating idea. I would be very interested to see what that would look like. Yeah.

 

[00:09:29] JP: I mean, this data must exist, right? If I'm writing a photo app and I could tell like, “Oh, this user, 98% of the time posts a selfie with a filter on it and they never post a selfie without a filter.” Maybe we can suggest, “Hey, you look great by yourself. Post a picture by yourself.” I was going to ask, have you worn makeup for a very long time?

 

[00:09:48] SY: Yeah. Yeah. I’d say that.

 

[00:09:49] JP: I mean not like during the day, but in your lifetime?

 

[00:09:52] SY: Yeah. I’ve had this face on for literally 48 hours. No. Yeah, since college, I’d say.

 

[00:09:59] JP: Okay. So my question to you is this is something I’ve always wondered about the beauty industry. Do you think the cosmetic and beauty industry and by extension filters and Instagram influencers, do you think it can actually promote the idea that as women, you look great on your own? Because at the end of the day, isn’t the industry set, like it’s selling you the idea that you need these additional products to look good?

 

[00:10:28] SY: So for me, I’m very lazy when it comes to makeup. So as much as I appreciate a good contour, I will never, ever in my life do it because it just takes too much work. I’ve seen the influencers put on layers and layers of makeup to get that look, and it’s just not that valuable to me personally. So for me, I’m not going to be influenced just because I just don’t care. I don’t care enough. But one thing that I have seen, again, recently, I’d say maybe in the last five to ten years, is this idea of natural beauty of just very light makeup, just enhancing your existing qualities, I would say Glossier is one of the companies that really led this, where they’re basically like, “Look, all you need is just a little bit of concealer, just like for that, those bags under your eyes, that’s it. Maybe a little hint of gloss just to get a little shine on those lips,” and very, very minimal, very, very minimal makeup. And that to me was really attractive because I thought, number one, super easy. It takes me like three minutes to put on makeup in the morning, and that is a type of branding message to me that really makes me feel that I’m already good enough. I’m just finding the best version of myself versus changing myself.

 

[00:11:45] JP: Interesting.

 

[00:11:46] SY: And so that really, really spoke to me where I thought, “I can look in the mirror and I’m most of the way there.” I’m like 95% of the way there. I just need a couple little tools, a little mascara, a little bit of this and that, just to really bring out the beauty versus feeling like, “Man, we need to go to work.” So I don’t think that I’ve been brought down, like my self-esteem has been hurt by the beauty industry, but my self-esteem has definitely been improved by this natural beauty, clean beauty aesthetic that has become really popular recently.

 

[00:12:19] JP: Interesting. I think about this. Having a daughter, I think back to my own childhood and never at any point cosmetics or the idea that I should put something on my skin or my body to make myself look a little better, that wasn’t very communicated to me as a male growing up. But with women, it’s obviously very, very communicated. And I try to relate this back to the developers, like thinking about the filters, you don’t see a lot, and I looked for research, I looked, I didn’t see a lot of filters to give myself a square jaw or to fill in my hairline.

 

[00:12:56] SY: That is interesting.

 

[00:12:57] JP: Right? Oh, I’d like to fill in this hairline. Let’s take 20 years off me. Let’s pop up my biceps a little bit. I didn’t find my filters on TikTok.

 

[00:13:05] SY: That is an interesting point.

 

[00:13:07] JP: Right? So as developers, we should probably be asking ourselves, “Are these filters, these features that we’re producing, are they applicable to everyone?” And if they’re not, we should probably think a little harder about them or at least challenge our product people about them.

 

[00:13:21] SY: Yeah. I can totally see a world where we have equal filtering happening for everybody. Yeah. That’s interesting. We’ll see how the trends of beauty again. Does that match? Does that kind of catch up with digital filtering?

 

[MUSIC BREAK]

 

[00:13:46] JP: So let’s move on to an AI-related story that’s a little more uplifting. This is a story that was in Fast Company and it’s written by Deanna, an elderly woman who is beta testing a companion robot named ElliQ. So ElliQ looks like a cylindrical lamp and it can intelligently correspond with people in. In the piece, Deanna writes, “She makes excellent conversation and has a wicked sense of humor. She’s a marvel of AI technology and every time she and I chit chat over breakfast or a cup of tea, I feel proud knowing that I’m helping to make AI technology even better. Older adults like me are often overlooked to the discussion about technology, but the age tech market is vast and growing.” I love this story because I think it’s really important to highlight and uplift older people, not only learning to code, but also use technology. And with the age tech market growing, it’s going to be even more important to include more elderly developers on your team and elderly users of your products. Deanna also talks about how even before a global pandemic, it could be difficult or even dangerous for seniors to go out and be social with friends. And as she’s interacted with her robot more and more, it’s become even more intelligent and has become almost like a real friend. I think this quote really sums it up well, “I understand that ElliQ is a robot, but for me she functions much like a human companion. She’s also a part of me. Her personality is now linked to mine. Just like it would be linked to someone else’s if she had been beta tested in a different home.” We’ll link to this article in our show notes, and it’s a really heartwarming read. This gave me all the feels.

 

[00:15:20] SY: I love this read.

 

[00:15:21] JP: I love it just for like the feel-goodness of it, but I love it because it reminded me that elderly people use technology as well and we don’t hear about it very often.

 

[00:15:32] SY: No, we really don’t. And someone said this to me recently. They were, like, it’s silver tech. So silver haired people and technology that’s kind of made for them, made to really help them get their jobs done, their tasks done, to make sure they’re connected to the world. And there’s not a lot of money in silver tech. That’s a huge potential market, especially if you think about how as millennials get older as well, like we’re definitely going to need technology for us too, and we’re already tech savvy. So I think it’s smart to invest in the older population and we just don’t do that.

 

[00:16:07] JP: I think there could be a lot of money in silver tech if you consider that maybe it’s not just the elderly buying these devices for themselves, but it’s their millennial children. It’s their Gen X children buying them for themselves. I think about all the technology I’ve bought for my parents over the years. At their request, they needed help picking out something. They were asking for recommendations. I found something that I thought could help improve their lives and bought it for them. I think there’s a little fast untapped market here.

 

[00:16:35] SY: Yeah, absolutely. And I love this story also because it’s written from an older person. It’s not written by a journalist who interviewed them, but it’s written in her own words, which really is beautiful and kind of gives a voice to the older generation. You said it’s really important for us as developers to not only build more for inclusive ages, but also to include different ages in our product development process.

 

[00:17:01] JP: Right.

 

[00:17:02] SY: And just reading this article and hearing the pride that Deanna takes in beta testing this robot and saying, “I’m making the world a better place. I’m helping solve a problem. I’m helping attack this huge problem of loneliness.” You just hear that pride through her words in this article. And it just is a great reminder that it’s important for us to hear directly from those users and to include them in our development process.

 

[00:17:27] JP: Yeah. I could definitely imagine a product development cycle where a product like this was suggested. And if you weren’t considering the elderly as a potential audience for it, you might be like, “Well, who would want a companion robot?” You can go out. You’ll have a lot of friends who’s sitting at home. But as soon as you start to think about the elderly as a potential audience, oh, it makes sense. They have much different needs than younger tech users would.

 

[00:17:53] SY: Absolutely. And it’s also kind of cool to think about how AI and machine learning and robotics is kind of this highly technical, challenging kind of new thing. And when you think about the older generation, you generally don’t think of them as highly technical or comfortable technology. And in this world, they are. They are part of the AI research. They are contributing to the robot learnings that it can better help other people. So it’s really cool to think about there’s this cutting-edge technology and the people helping make it happen and make it better are people who traditionally we don’t associate with cutting-edge technologies. So I think that’s really cool.

 

[00:18:32] JP: I really appreciated the article written from the beta tester, Deanna’s perspective. She knows that this robot isn’t real. It might just look like a lamp. But for her, she can separate the idea that this is just a physical object and yet it makes her feel better and that’s ultimately a win for her mental health. I appreciated that perspective for a bit and I definitely appreciate the idea that, I think, especially in Western culture, we treat the elderly like large children a lot of times, and that’s not helpful for anyone.

 

[00:19:07] SY: That is a really good way to describe it, large children. Oh, that’s really interesting.

 

[00:19:12] JP: Yeah.

 

[00:19:12] SY: Yeah, I can totally see that.

 

[00:19:14] JP: But she can recognize this is an object. I understand it’s just an object, but it makes me feel better and it’s better for my health. And I think that was a fantastic perspective.

 

[00:19:23] SY: So last week we talked about a super powerful GPT-3 powered AI chat bot that people were using in really interesting ways to talk to all kinds of characters and personalities. And coming up next, we talk to Max Jaderberg, Senior Staff Research Scientist at DeepMind, about another fascinating AI tool that much like us humans can learn general skills by experimentation and exploration in its 3D playground, XLand. after this.

 

[MUSIC BREAK]

 

[00:20:10] SY: Joining us is Max Jaderberg, Senior Staff Research Scientist at DeepMind. Thank you so much for being here.

 

[00:20:16] MJ: It’s a pleasure to be here. Thanks for inviting me.

 

[00:20:19] SY: So tell us about your career background and what you do at DeepMind.

 

[00:20:23] MJ: So I lead what’s called the Open-Ended Learning Team, a team of scientists and engineers here. I’ve been at DeepMind for, coming up to seven years now. Before that, I was doing a PhD in computer vision at a startup, which was acquired by Google to come into DeepMind. And yeah, since being here, been working on a whole host of methods from computer vision through to reinforcement learning and now open-ended learning.

 

[00:20:49] SY: Let’s dig into your research paper entitled “Open-Ended Learning Leads to Generally Capable Agents”. Tell us about the paper and tell us about XLand. Paint us a picture of what it is.

 

[00:21:01] MJ: This paper is sort of coming off the back of a few years of amazing research in reinforcement learning and in particular, looking at some of these reinforcement learning results, which have occurred in simulated environments and particularly video games. We’ve had amazing results in things like Go, StarCraft, Capture the Flag, Dota, all these things where agents learn to play through trial and error, through reinforcement learning, and they get really, really, really good at playing this game and actually they get to human level, superhuman level. They can beat the top pros and it’s really cool. What we get out of these pieces of work is an agent, which is really good at this one particular game, is really good at StarCraft, is really good at Capture the Flag, it’s really good at those two, but that same agent can’t then shift focus and train on StarCraft and then play Dota. And if we’re thinking about this much further field goal of creating artificial general intelligence, we want an agent which can do many different things, not just one particular thing. So the idea behind this work is to step in this direction of getting more general agents, more generally capable agents. And that’s what this paper is starting to look towards. So one of the key ideas to move in this direction is to, instead of just training on one game like Capture the Flag, you train on hundreds of thousands, even millions of different games. And as a result, the agent can then generalize, can quickly get to grips with a new game that’s never seen before attest time, and that’s where this XLand environment comes into play.

 

[00:22:37] SY: So tell us more about XLand. How does it work?

 

[00:22:40] MJ: XLand is a 3D first person video game-like environment. It’s all built in Unity. And XLand, you can think of as a bit like an environment space or like a meta-game. It’s got these building blocks, different floors with different colors, different objects that can be picked up and thrown around, player avatars, which have different gadgets, things like freezing objects or tagging other players or objects. And these physical components can be composed in many different ways. You can have different shapes, typologies and maps, different arrangements of objects and then crucially there’s the actual, “What gives you rewards? What gives you scores, plus ones in the game?” And this reward function, as we call it, is composed by a logical arrangement of the different elements of this environment. So for example, the game of hide and seek with two players, you would have two players in XLand and the goal of one of the players would be, “I want to see the other player,” and the goal of the other player is, “I want to be not seen by the other player.” And that defines a game of hide and seek. Instead, you could play this game of like trying to capture a cube where one player has the goal. I want to take the red cube and put it on the purple floor and the other player wants to take the red cube and put it on the blue floor. And suddenly, we have a game of Capture the Flag.

 

[00:24:03] JP: So you could give the AIs different rules for different games. What were you hoping to accomplish with XLand and this research in giving the AIs different types of game rules?

 

[00:24:17] MJ: Having this environment space where we can automatically and proceed to generate these different game rules, it means we can very easily create a training dataset of games, which is more than one, two, three games. It’s hundreds of thousands. It’s millions of games. And forcing an agent to try and learn to play all of these at the same time during training. Just like in computer vision, training on lots and lots of different images, just like in natural language processing, trying to train on all of Wikipedia and more. By trying to train on all of these different games, rather than just one or two, you’ll get very general behavior out agents, which are able to play any sort of game, which even it hasn’t encountered during training.

 

[00:25:02] SY: You touched on this a little bit earlier, but I’d like to go into a little bit more detail. Can you compare and contrast what and how the AI is learning in XLand versus AlphaZero, which is being some of the best chess and Go players? How would you kind of compare those two?

 

[00:25:17] MJ: Yeah. So AlphaZero, it’s one learning algorithm, and then you apply that learning algorithm on a single game, for example, chess, and you get an agent out of that. In contrast, here, we have a single agent with a single learning algorithm being applied to millions of different games and getting a single agent, which can play all of these out. So it’s that sort of generality, which was not there with AlphaZero. Also in terms of the type of games in AlphaZero, this was looking at these sequential two player games like chess, like Go, was here we’re talking about a 3D environment with more than two agents running about and doing lots of things.

 

[00:26:00] SY: And going into the how a little bit more, I imagine teaching an AI to do one game can take a lot of time, a lot of processing power, a lot of effort. Going through that same process with hundreds of games, thousands of games sounds expensive. Is that the process or is there a different methodology for teaching AIs multiple games versus the one?

 

[00:26:27] MJ: Yeah. As you can imagine, it is very difficult going from just one game to hundreds of thousands of games. And I think the key thing to point out is not every game is as interesting as every other game to learn from. Some games are very simple. Some games are very complex. Maybe at the beginning of training it’s better for the learning progress of the agent to train on a simple game, with an easy to find positive outcome. Whereas later on in training, maybe it’s better to stop playing on these simple games and actually focus more on these complex games to get more depth of behavior, more interesting behaviors out. But as humans, we don’t know what these neural networks really need. So instead, the idea is we create what we call these open-ended learning processes, which automatically discover and co-evolve what the training distribution of games should be for the agents at every point in training as these training processes go on for a couple of weeks.

 

[00:27:27] JP: So this paper represents a lot of research. What were your major findings?

 

[00:27:32] MJ: Number one, that if we have this big, vast and diverse environment space like XLand. And number two, that we create these open-ended learning processes with sort of state-of-the-art reinforcement learning at the core of that. Then what we get out, our agents, which are actually really what we call generally capable, they’re able to play lots and lots of these challenges, participate in all of the challenges in this space and actually end up learning really interesting generic behaviors, things like experimentation. Agents might not know exactly how to solve a game or solve a challenge. They’ll try many things. They’ve learned this in a general heuristic behavior of trying lots of different things. And if it actually manages to succeed, it can recognize when it succeeded and stop experimenting and keep things still.

 

[00:28:24] JP: Is this the first time that we’ve been able to see AI adapt and learn this well?

 

[00:28:32] MJ: I think it’s one of the first times we’ve seen reinforcement learning produce agents, which are zero short generalizing. So without ever seeing these tasks before, having their behavior work and be competent on these new tasks that haven’t been seen before.

 

[00:28:50] SY: So it’s mostly kind of bumping into things, doing random things, learning from a lot of trial and error which frankly is how a lot of humans learn. That’s how we do things. We try. We practice. We modify. We update. When you think about the future of training and AI in the future of these agents, do you feel like that’s going to continue to be the way we train an AI? Or do you feel like we’ll make some changes, advancements to how we train them? Is it just about doing the same process and just doing it faster with better sets of data? Or do you think it’ll actually be different?

 

[00:29:25] MJ: I think this idea of trial and error is really going to be at the core of things, but we can do this in a more and more intelligent way, and we can even have humans coming into the training process, almost guiding this trial and error process, this reinforcement learning process. I really see reinforcement learning as a very effective mechanism and a mechanism crucially, which scales with the compute power that we have. It’s brilliant to have a method where as computers get faster, we have access to more computers. These methods will get better and better and better. And we’ve seen this in supervised learning and GPT-3 is a great example of this. And there’s no reason why we shouldn’t see this sort of advancement of results as we scale our compute with reinforcement learning at the core as well.

 

[00:30:15] JP: It’s great that you just mentioned GPT-3. Recently, we interviewed a creator of a chatbot engine that’s using GPT-3 and he was explaining to us some of the large advances that are being made with language processing and AI learning to converse with people. What kind of things do you think we’ll see either as subjects of research or potential applications for your work and other projects like GPT-3 in the future?

 

[00:30:49] MJ: Yeah. GPT-3, the modality is language. The way that you and me can interact with this chat bot is through language. The analog here for what we’re doing with this paper and what hopefully will follow is the interaction is through this 3D simulated environment. So we’ll see humans and agents interacting through the embodiment in the game world, communicating with each other, setting challenges for each other and informing each other, cooperating, competing, and crucially displaying this sort of adaptive behavior that we’re seeing in GPT-3 in language. We’ll see that in actions and movements and strategies within this 3D simulated world.

 

[00:31:36] SY: So what’s next for your research?

 

[00:31:38] MJ: We’re really excited about these results and we’re really excited to see where this train of thought goes where as we add complexity to the environment and as we scale up training, as we add more sophisticated training mechanics, how this general behavior continues to emerge, how we can move even further towards these agents that adapt really intelligently online to new unseen challenges. This is really what’s next for us is continuing to build on this platform to move towards that.

 

[00:32:10] SY: Well, thank you so much for being here.

 

[00:32:11] MJ: Thanks a lot for having me.

 

[00:32:18] SY: Coming up next, we chat with Suzanne Aitchison, Software Engineer and Accessibility Specialist here at Forem, about Android 12 beta’s camera switches, which left users control their phone with facial expressions after this.

 

[MUSIC BREAK]

 

[00:32:44] SY: Joining us is Suzanne Aitchison, Software Engineer and Accessibility Specialist here at Forem. Thank you so much for joining us.

 

[00:32:51] SA: Hi. Nice to be here.

 

[00:32:53] SY: So tell us about your developer background.

 

[00:32:55] SA: Yes. So I actually started out my career as a quality assurance engineer, specifically with Android mobile, and then just like one thing happens after the other and I landed working more in the web and then the front end with React and then stumbled upon accessibility quite far into my career. I was working with a client who had some particular accessibility needs and my organization did not know how to deliver them. So out of necessity, I started learning about accessibility, and since then, I think it’s one of these things that once you know a little bit, you can’t really stop. So now further down the line, I specialize really in the front end and with a focus on web accessibility. In the background, I’m still studying. I’m still learning more. It’s a big field. So I’m currently pursuing a certification to become like a bona fide web accessibility specialist.

 

[00:33:57] SY: Oh, cool!

 

[00:34:00] JP: Tell us about the kinds of things you work on at Forem.

 

[00:34:02] SA: Yes. So at Forem, I work mostly, when we think about the Forem app, it’s used in a lot of places. So desktop gets quite like heavy use, but also people do use it on their mobiles. So I work mostly in the front end of the code base. So everything, JavaScript, HTML, lots of sort of shiny interactivity on the front end. And of course, through all the efforts on that, what we’re bearing in mind is like, “How do different users want to use this feature? How do we make this work with keyboard? How do we make this work with a screen reader?” So we spend a lot of time diving into sort of these niggly details of, “Okay, it looks right, but does it work right for everybody?”

 

[00:34:46] SY: So let’s get into Android’s camera’s switches. What is it and what are your initial thoughts?

 

[00:34:51] SA: Yeah. So this is a new feature for Android, and I’m very excited about it from what I’ve heard so far. So if people haven’t heard about switches before, like a switch, you can think about it as like a manual button that you would push. There’s a whole range of different switches available. You get ones where you might puff or sip into a straw. You might kick it with your foot. These things are designed for just whichever part of a person’s body is most able to sort of reliably reproduce an action. So you can imagine at the moment, quite commonly, it would be like just a big red button and a switch would allow you to interact with your touch screen on your mobile phone by sort of plugging in this hardware device and then you can use that to scan through your icons to find the right one to activate. Once you’re in an app, you can use it in different ways to scroll the page, to select a button, a link. All of this stuff that you would normally be tapping on your screen to do. But what this new feature brings in is sort of a gesture-based control based on what your facial expressions are doing. So the idea is that for some users, I mean, this won’t work for everyone who uses a manual switch-based because you have to have that solid control of those expressions, which not everybody would be able to have, might not be the best for everyone. But for a large subset of those users, they might want to use these facial expressions instead. So instead of pushing a manual button, they’re going to look up, they’re going to smile or you’ll look back to the sides. Blink I think is one of the expressions you can use as well. And then your phone will recognize that through your front camera and it will have an associated action that you have defined. So this is all like very customizable. So maybe I’ve set it so that when I look down, it’s going to scroll down for me. Or maybe when I look right, it’s going to go to the next interactive item. With only your face and your camera, you can eliminate the need for this extra piece of hardware that you’ve maybe been carrying around and you can interact with your mobile phone without touching the screen.

 

[00:37:11] JP: Do you know if Android developers have to do any work to add support for this to their apps? Or is this all being done at the OS level?

 

[00:37:18] SA: There should be an OS level change on the basis that their apps are currently accessible to switch devices, which hopefully they are. So you have normally anything that works with the keyboard API. I guess you can call it a keyboard API, even though it’s on mobile. It’s not the same as having this hardware keyboard, but there are gestures baked into Android devices, all mobile devices that allow you to interact in different ways with the interactive content. So for example, if you use a screen reader on Android, if you use TalkBack, which is a baked in one, you might be swiping across the screen from left to right to move between these interactive elements. That same API is going to be used under the hood to connect these new gestures, to move it onto the next thing. So really, if our apps and our websites are all maids accessibly, we’re keeping in mind these different modes of engagement, particularly thinking about keyboard and screen reader access. If we tested these things, it should just work.

 

[00:38:23] SY: So what are some potential problems that you foresee with a facial recognition feature like this? Because ultimately, this is AI, this is facial recognition. Do you see any issues with that?

 

[00:38:34] SA: Yeah. I mean, I’m sure people are going to raise concerns about if any of this is being collected and what it’s being used for. I’m sure there will be privacy settings to customize that. There’s also difficulties with any sort of facial recognition, especially when we’re talking about people who have motor disabilities, they likely have a loss of some kind of motor control, it might not be a reliable alternative for them to a switch device. I think it will be different for each user. I think the good thing about the feature is that there’s a lot of different options. So they’re not saying, “Okay, we’re replacing the switch control with a blink and a smile.” Maybe one of those doesn’t work for you just because of either the technology itself or maybe just your own preferences and abilities. There is a range of gestures baked in and it’s nice that they’ve also baked in like some… you can customize how sensitive these controls are. When we say you have to smile to activate this thing, how big a smile are we talking? How long do you have to hold this smile for? And those things are all customizable so that hopefully people aren’t accidentally activating it, but also hopefully the bar isn’t set too high in terms of the movement that’s expected for a user.

 

[00:40:02] SY: So what other things do you think could be added and improved upon with mobile accessibility in general?

 

[00:40:09] SA: I think it’s great to see these gesture-based controls coming in. I do really like that. I think a lot of the work to be done with mobile accessibility really falls on less the operating system level and more is individual developers because these tools and these APIs are only as good as the code that they’re being consumed with. So all of these great features don’t work if we don’t code things accessibly.

 

[00:40:39] JP: Can you talk about some of the biggest things that developers often forget about when it comes to accessibility and let our audience know what sort of things they should be looking out for and developing for?

 

[00:40:50] SA: There’s some fairly low-hanging fruit out there, things like making sure that any images have accessible descriptions and that they’re useful accessible descriptions. So there’s a lot of code out in the wild where the description that’s perceived by screen reader users would be something like image. It’s not a logo. It’s not helpful. And I think this sort of thing happens because maybe it’s developers, especially when we start out, we might be not fully aware of how these things are surfaced to users. So looking at for things like alternative texts, but also just about using standard controls. I think this gets a lot more into the weeds when we think about the browser experience on mobile because Android developer tools are arguably better, gazing and linting and sort of driving developers towards the right elements to use, but out on the web is almost a little bit of a free for all because you can make almost any HTML element look a certain way or sort of visually appear to be behaving in a certain way with JavaScript or CSS. But if that control isn’t by default and interactive control, it’s not going to be perceived by these APIs in a correct way. For example, using buttons when we should be using buttons, not using elements like a div or a span. These things are really important, but visually, they might look okay to abled users. They might be working fine for abled users. but especially when you get down to mobile phone level where your users are also contending with your smaller touch screen. So these targets are smaller. It’s a lot harder to parse all this content that we’re putting into these apps. And then if we add these additional obstacles of, “I can’t actually get to that button using my switch device, using my gesture controls, using my screen reader,” because it’s not really a button. That in the end is going to be the ultimate stumbling block.

 

[00:43:07] JP: Is there anything else we didn’t mention here today that you’d like to talk about?

 

[00:43:11] SA: Yeah. I just think it’s good to be cognizant overall of these different modes of engagement with the content that we produce as developers. I think as well it’s exciting to think about the fact that eventually this might not be considered just an accessibility feature. When you think about some other accessibility features that are more embedded and commonplace like say captions for videos, that’s just standard now and a lot of people, myself included, will use them not because I’m not hard of hearing, I’m not deaf, but I just like it. It helps me follow. It helps me pay attention. I just like it. I mean, I think at the moment these new gesture-based controls are embedded quite deep in the accessibility sense. You really have to know that you’re looking for it to get there. But eventually, we might see this becoming more commonplace and it could help a whole range of people. Nursing babies, their hands are filled, they want to still interact with their phone, maybe they could be used in this API feature, all different kinds of users.

 

[00:44:22] SY: Well, thank you so much for joining us today.

 

[00:44:24] SA: No problem.

 

[00:44:36] SY: Thank you for listening to DevNews. This show is produced and mixed by Levi Sharpe. Editorial oversight is provided by Peter Frank, Ben Halpern, and Jess Lee. Our theme music is by Dan Powell. If you have any questions or comments, dial into our Google Voice at +1 (929) 500-1513 or email us at [email protected] Please rate and subscribe to this show wherever you get your podcasts.