Most early and popular ideas around AR focus on augmenting human vision and technology was developed to support these ideas. The camera plays the main role in this type of Augmented Reality (AR). A camera paired with a computer (smartphone) uses computer vision(CV)to scan it’s surroundings and content is superimposed on the camera view. A large number of modern AR applications readily use the smartphone’s camera to show 3D objects in real space without having to use special markers. This method is sometimes called marker-less AR. However, this was not always the standard. There are number of techniques used to augment content on the camera view.
Fiducial markers and images
Fiducial markers are black and white patterns often printed on a plane surface. The computer vision algorithm uses these markers to scan the image to place and scale the 3D object in the camera view accordingly. Earlier AR solutions regularly relied on fiducial markers. As an alternative, images too can be used instead of fiducial markers. Fiducial markers are the most accurate mechanisms for AR content creation and are regularly used in motion capture (MOCAP) in the film industry.
3D depth sensing
With ‘You are the controller’ as it’s tagline, Microsoft’s Kinect was a revolutionary device for Augmented reality research. It is a 3D depth-sensing camera which recognizes and maps spatial data. 3D depth sensing was available much before the Kinect, however the Kinect made the technology a lot more accessible. It changed the way regular computers see and augment natural environments. Depth sensing cameras analyze and map spatial environments to place 3D objects in the camera view. A more mainstream depth sensing camera in the recent times would be iPhone X’s front camera.
Simultaneous localization and mapping (SLAM)
For a robot or a computer to be able to move through or augment an environment, it needs to map the environment and understand it’s location within it. Simultaneous localization and mapping (SLAM) is a technology that enables just that. It was originally built for robots to navigate complex terrains and is even used by Google’s self-driving car. As the name suggests, SLAM enables realtime mapping of environment to generate a 3D map with the help of a camera and a few sensors. This 3D map can be used by the computer to place multimedia content in the environment.
3D depth sensing cameras like Microsoft’s Kinect and Intel’s real sense, and SLAM generate a set of datapoints in space known as a point cloud. Point clouds are referenced by the computer to place content in 3D environments. Once mapped to an environment, they enable the system to remember where a 3D object is placed in an environment or even at a particular GPS location.
Machine learning + Normal camera
Earlier AR methods relied on a multitude of sensors in addition to the camera. Software libraries like OpenCV, Vuforia, ARCore, ARKit, MRKit have enabled AR on small computing devices like the smartphone with surprising accuracy. These libraries use machine learning algorithms to place 3D objects in the environment and require only a digital camera for input. The frugality of these algorithms in terms of sensor requirements have largely been responsible for the ensuing excitement around AR in recent times.