August 19, 2016
Amir Rubin is the cofounder and CEO of Paracosm.
Amir Rubin of Paracosm describes how 3-D models and maps of work environments will expand augmented reality capabilities and solutions.
PwC: Amir, can you please introduce yourself and your company?
Amir Rubin: Sure. I am the cofounder and CEO of Paracosm. Paracosm is a 3-D mapping company, and we develop software that takes data from 3-D depth sensors and generates very detailed maps and 3-D models of large spaces. For example, using a depth sensor, we can grab a video of a home, an office, a retail space, a construction site, or a large industrial facility, and Paracosm software will process that video and generate a 3-D model of the space. Such a model can then be used for a variety of purposes, such as in augmented reality [AR] applications.
PwC: Are you providing hardware as well?
Amir Rubin: Paracosm develops only the software, and we rely on third-party 3-D sensors. The sensors need to have a 3-D camera that uses an infrared laser to sense the world in 3-D. Currently, not many 3-D sensors are on the market. That will change, as they’ll soon be embedded in mobile phones, tablets, and other common devices.
PwC: Do you think 3-D sensors eventually will become part of smartglasses?
Amir Rubin: I think the 3-D sensors will bring a key capability to augmented reality or mixed reality headsets. These headsets will be able to display fully 3-D objects to the user. The combination of these next-generation AR headsets, 3-D sensors, and software such as what Paracosm is developing can enable some really interesting capabilities, such as placing a virtual object into a room, and placing it precisely.
For example, I can put a virtual flower vase onto the table. Or, through my headset, I can have a virtual dinosaur stomping around my living room. Or a virtual tour guide can walk me through a museum.
“To make all this technology useful for enterprise work, workflows should be embedded into the AR display—workflows that make a job easier for people on the factory floor or in an industrial plant or at a construction site.”
PwC: How do you rate the 3-D performance of smartglasses available today?
Amir Rubin: The hardware and software pieces are finally falling into place. Some of the headsets that are on the market and coming out this year can do 3-D pretty well. The vendors have built absolutely amazing systems that have integrated 3-D sensors and a next-generation 3-D display.
What’s missing is the system integration among all of these different pieces into a seamless solution. The other barrier right now is customer adoption. To make all this technology useful for enterprise work, workflows should be embedded into the AR display— workflows that make a job easier for people on the factory floor or in an industrial plant or at a construction site. People know what those specific workflows need to be, but they haven’t been developed for use on AR glasses yet because everything is still so new.
PwC: How does the Paracosm software help with those workflows?
Amir Rubin: First we capture the full structure of the environment, such as a home, office, industrial plant, museum, construction site, or whatever. Even if it’s a snapshot in time, we want to know the dimensions, the floor plan, where the big objects are, and how everything is laid out. Then we can use that as the skeleton to place augmented information into and to plan a workflow.
Once users have a map, they’ll want to ask the question, where am I? If someone is wearing one of these headsets that has a 3-D sensor on it—or maybe it’s a regular camera—or if someone is holding up a phone, we want to match the view of the headset or the phone in real time to its location on the pre-generated map. That orientation and navigation are the two pieces Paracosm has spent the past three years building.
We’re starting to work on another piece. Let’s pretend I scanned my house. I don’t want it to be just a 3-D map, and I don’t want to just track my position. I want my app, software, or robot to recognize my couch—to pick out the couch from the scene. The capability to identify the specific landmarks and items in the scene then unlocks the deeper layer. People are still working on that.
“I want my app, software, or robot to recognize my couch—to pick out the couch from the scene. The capability to identify the specific landmarks and items in the scene then unlocks the deeper layer. People are still working on that.”
PwC: How do you expect the adoption of your technology to take place?
Amir Rubin: We’re envisioning that the technology will find initial use in fixed environments where the main elements won’t change too often. Examples include industrial plants where the heavy equipment doesn’t move around too much or places where we can scan frequently as things change, such as on a construction site. Other interesting environments are museums, airports, and so on.
For such environments, we can build a map ahead of time. Having the map ahead of time is significant, because then someone can preplan the important workflow. Doing everything on the fly is not always desirable. If I’m in the warehouse and I’m picking an order, I want the system to map my route through the warehouse and show me where to go. If I’m on a construction site, I want to see a 3-D overlay of the construction drawings and CAD models of what I need to build on the exact spot it needs to be. To do that, the warehouse or job site must be pre-mapped. For a virtual tour guide in a museum to suggest, “Let’s walk to this painting and I’ll tell you about it,” the museum must be pre-mapped.
In these environments, we also can add supplemental sensors to help the orientation and navigation. For example, in a big factory, we can place Bluetooth beacons unobtrusively. We can stage the environment, so when the workers are walking through the space, the technology solution is not purely dependent on the camera view. We can grab some data from a Bluetooth beacon; maybe it recognizes a pattern marker on the ceiling or the wall. Not only is the environment fixed, but we can enhance it.
Over time, our solution can be used in more dynamic environments that change frequently. We either build the model in real time or scan the spaces more frequently. That will depend on the use case.
PwC: Advancements in SLAM [simultaneous localization and mapping] algorithms have opened up robotic applications to many new use cases. What role does SLAM play in your solution?
Amir Rubin: Effectively, Paracosm’s tracking algorithms are a customized SLAM system. When we perform comprehensive 3-D mapping ahead of time, we don’t need to do simultaneous localization and mapping.
The big idea for us and for enterprise AR is to build the best mapping algorithm, the best mapping system, we can build. We do that ahead of time offline, because it could take a little longer to process. Once we have that map, then in real time when the users are moving around and working, we use a modified SLAM system to localize the users, giving them the orientation and navigation. Because we can map the facility ahead of time, we split the problem in half instead of trying to do it all at the same time.
“For me, the dream is that AR will start to replace screens—including a phone, tablet, or TV screen— because the image will be right there on a user’s headset.”
PwC: How does the mapping process work?
Amir Rubin: All 3-D sensors sense the environment as a series of 3-D points, and that collection of points is called the point cloud. Our software takes data from 3-D sensors, which can cover tens of thousands of square feet, and builds or fits all the point clouds together optimally through a 3-D reconstruction process. The resulting 3-D model represents the physical space.
The problem is that processors tend to choke on point cloud, because it has so much data. So, we create a mesh on top of it. A mesh is like connecting the dots of the point cloud by using triangles: a mesh consists of triangles instead of points.
Right now we use pure computer graphics and some geometry to make a mesh that has a lot of detail and a lot of triangles and areas of interest, such as a ladder, a machine, or a piece of furniture. We try to be intelligent: perhaps the walls and floors don’t have any detail that we really care about, so we try to have as few details as possible on walls and floors. Then the final piece of the puzzle is on the mesh, which consists of these triangular faces. We basically paste the color imagery from the camera onto these triangle faces of the mesh to try to make it photorealistic.
PwC: In what format do you store your data? Are there standards emerging?
Amir Rubin: We found that representing the large area model as an OBJ format is fine for us. The mesh file is the structure of the room. It comes out as a mesh of these polygon faces, where the size and the density of the polygons depend on the area of detail. For the textures, we now have some good ways to take the source images—the photographs from the video feeds—as photo textures onto the faces. The standard OBJ file format handles all that. We use a secondary custom data structure to track the user’s position as the user walks through the space.
“Eventually, all the indoor spaces will be mapped in the same way outdoor spaces currently are mapped.”
PwC: How do you see the AR solutions evolving?
Amir Rubin: For me, the dream is that AR will start to replace screens—including a phone, tablet, or TV screen—because the image will be right there on a user’s headset. Maybe a phone will be a processing unit in someone’s pocket, but I can see a future where AR replaces screens.
Eventually, all the indoor spaces will be mapped in the same way outdoor spaces currently are mapped. Then, when users are wearing their smartglasses, any app that they’re running can be seamlessly tied into the room they’re in. When someone walks into an office building, for example, a virtual receptionist can walk that person to a meeting room. Or, visually impaired people can have audible instructions to guide them through a facility. Telepresence applications also become possible, where a worker walks into a space to do a repair job, and an expert on the other side of the world looks through the worker’s eyes during the repair work. The expert can annotate the scene to show the worker exactly what to do.
Users will have not only a screen on their faces, but a full environmental awareness as well. That’s the Holy Grail these capabilities are heading toward.