August 22, 2016
Michael Buckwald is the cofounder and CEO of Leap Motion.
Michael Buckwald of Leap Motion explains why the future of AR interfaces is pointing, not clicking.
PwC: How did Leap Motion get started, and what are you doing now?
Michael Buckwald: The original vision for Leap started seven or eight years ago with our cofounder and CTO David Holz. David was working on a 3-D model. He realized quickly that creating something really simple like a coffee cup took longer on a computer than it would take a five-year-old to create the same thing out of clay. That led to this frustration that even though technology is supposed to make people better and faster, it often gets in the way. That also led us to the realization that the problem is not that technology isn’t powerful enough.
“Everyone has a supercomputer in their pocket, but the ways that people communicate or talk to technology are deeply simplistic and primitive.”
Everyone has a supercomputer in their pocket, but the ways that people communicate or talk to technology are deeply simplistic and primitive. A person is either touching a touch screen or not touching it. Clicking it, or not clicking it. Compare that to how people use their hands in the physical world every day—even very basic actions like reaching out and drinking from a cup. Leap Motion breaks down all the constituent actions that are necessary to do that: reaching out, grabbing the cup, moving it. Those actions are automatic to humans, and they succeed 100 percent of the time. But in reality, those actions are very complicated.
That started us on a path of saying: Maybe we can find some way to technologically track a hand and the fingers accurately enough so people can use a virtual hand to touch virtual content the same way that people use their actual hands to interact with the physical world. That led to many years of research, because tracking a hand and fingers turns out to be very much a nontrivial problem.
PwC: How does your product work?
Michael Buckwald: Whereas almost everyone else tries to solve these sorts of problems with hardware, we use software. We use two off-the-shelf VGA cameras and some LEDs, which cost just a few dollars. It’s the software that’s obviously very, very complicated.
We began tracking the hands and fingers with our first product, released three years ago. We basically tracked just the fingertips. Through updates to our software, we have progressed to tracking the fingers themselves, including all the joints of the hands, and now we’re focused on applying that technology to different industries such as VR [virtual reality] and AR [augmented reality].
PwC: Where do the cameras reside?
Michael Buckwald: The first device had the camera sitting on the desk, looking up at the ceiling, and the device was plugged into a computer. But for VR and AR, we think the best experience is having the camera mounted on a person’s head, so they can interact with anything they look at. We’re working with headset OEMs [original equipment manufacturers] to embed the technology as our primary position model.
PwC: Where does your software reside? Is it embedded in the device or in the cloud?
Michael Buckwald: The focus for us is being embedded in other OEM devices. If the device is a tethered VR headset that connects to a PC, then our software is on the PC. If it’s a mobile headset, it would run on the processor on the device.
PwC: You said there are other approaches to solving this problem. What are the other ways that people accomplish this kind of tracking?
“If people will be wearing AR headsets for hours a day, then they’ll need a rich and powerful gesture-based interaction system that’s always with them.”
Michael Buckwald: There are other technologies that people use to track bigger objects and grosser movements, as already used in some gaming systems. They typically use an approach called time of flight, where one pulses a light source and measures how long it takes the photons to bounce back. That approach is accurate enough to track big movements like using the hands to swing a ping-pong paddle, but it’s not accurate enough to track fingers. Some other approaches record gross gestures, such as whether a person is swiping left or right or up or down, but those approaches aren’t actually tracking the motion of the hands and fingers in real time.
VR and AR is a demanding space. It requires extremely low latency, the ability to track 10 fingers, and letting people do things like grab and push and pull. That involves tracking the fingers even when one can’t actually see them. If somebody is grabbing and releasing something and the sensor is mounted on the head, we must be able to determine that their fingers are unclasping and that they’re letting go of the ball even when the human eye from that position wouldn’t be able to see the fingers.
PwC: What problems are you focused on solving now? You said you’ve gone from tracking fingertips to tracking all the joints of the hand. Where do you go from here?
Michael Buckwald: To model actions like grabbing and releasing, we’ve needed to train more than 70,000 permutations of grabbing and releasing because there are so many different subtle ways that people do those things. We’ve built some of our own demos in house. The goal in the near future is to take our findings and make them available to developers, so a developer can easily replicate these physical interactions in experiences they build on their own.
We’re very, very excited about AR in the future as well. If people will be wearing AR headsets for hours a day, then they’ll need a rich and powerful gesture-based interaction system that’s always with them.
PwC: Why is tracking so complex? Is it about different sizes or levels of hardness in the objects?
“We want people to be able to grab the same way that they grab objects in the real world and just have it work.”
Michael Buckwald: It’s not about the objects. It’s about the ways that people grab. Sometimes they grab with three fingers elevated at a particular angle, they might grab with the whole hand completely closed, or they might grab and release by closing their hand only a few millimeters and opening it a few millimeters—that sort of variance. We don’t want to tell users that there is a right way and a wrong way to do it. We want people to be able to grab the same way that they grab objects in the real world and just have it work.
PwC: Are there other things you want to be able to track besides hands and fingers?
Michael Buckwald: We think hands and fingers definitely have the most fidelity over tracking, but tracking the tools people hold in their hands can be interesting, too. Being able to track items like a stylus or a pen, or a plastic gun in a game—that’s interesting for us.
PwC: Can you talk about some of the AR use cases you anticipate for this technology?
Michael Buckwald: We think of our company as a fundamental input system for people who will be walking around wearing the devices for hours at a time. A core part of a person’s AR experience obviously will be looking at their own hands through a transparent display and seeing their hands interact with content that is projected in front of them. We really think of hands as the primary input method for AR, maybe alongside voice recognition. Hands will be used for simple things like virtual button presses, but developers will also use the SDK [software development kit] to build richer and deeper interactions that involve grabbing, rotating, and manipulating virtual objects.
PwC: Looking at motion tracking as an industry, what challenges remain to be solved before it can become a mainstream technology?
Michael Buckwald: I think it’s more about a parallel market like AR developing. The technology is ready for the mass market, but there obviously must be a specific use case for tracking hands by themselves. I think that if VR and AR become mainstream, the hands will become a primary interface device for them.