February 26, 2016
by Anand Rao
What exactly is “deep learning” and what accounts for its rapid rise in popularity and media coverage?
In the first two blogs in this series we have defined machine learning and evaluated three different forms of machine learning: supervised learning, unsupervised learning and reinforcement learning. In this blog, we look at a type of learning called ‘deep learning’ that has received a lot of press lately. What exactly is it and what accounts for this rapid rise in popularity?
Since the early days (i.e., 1950’s) of Artificial Intelligence there have been a number of attempts to model the neurons in our brain to develop systems to learn just like humans do. An artificial neural network is an interconnected network of nodes or units arranged in multiple layers, where each layer is connected to layers on either side. Each unit receives input from all the units on its left and multiplies it by the weights associated with the connections. If the sum exceeds a certain threshold value it ‘fires’ activating the units to the right to which it is connected. This mechanism loosely resembles the way neurons fire in our brains and hence the name ‘artificial neural networks’. However, the real functioning of the brain is far more complex.
The early days of artificial neural networks had just a single input layer, a single output layer and one hidden layer. Such artificial neural networks could learn limited functions. Although one could make a neural network more expressive and powerful by introducing multiple intermediate layers, the computational power required to learn intermediate features was prohibitive and made these neural networks impractical. In the mid-2000’s better methods for learning hierarchical features and the use of Graphic Processing Units (GPUs) for running multiple layers of artificial neural networks created a revival of interest and impressive results in image processing, audio analysis and text mining.
Deep learning is a collection of techniques for building multi-layered, non-linear artificial neural networks that can learn features from the input data. For example, a deep learning algorithm can take in an array of numbers that represents pixels from an image, run a series of functions on the array, and output if the image belongs to a particular category, say the picture of a car. The hidden layers between the input and output nodes learn the different features.
Let’s say the input image is represented as a set of nodes one for each pixel. The next layer detects the features within the image, e.g., lines, curvatures, etc. The subsequent layers will combine the features to detect shapes such as doors, head lights etc. Even subsequent layers will combine these features to detect that the picture is that of a car and of a particular make and model. Recent deep learning algorithms can be up to 150+ intermediate layers of nodes that can learn a hierarchy of features.
Deep learning today is being used not only to recognize images, but is also used to understand and process voice, video, and large bodies of text. Deep learning is particularly useful at doing tasks that we do involuntarily and often cannot articulate precisely for a machine to follow. For example, we can recognize a cat from a dog – under different lighting conditions or even if certain parts of the image were hidden. Similarly, we may be able to identify the voice of Humphrey Bogart, but would be unable to precisely articulate how to distinguish his voice from those of others. It is these tasks that deep learning algorithms excel at.
In the next post in this series, we’ll talk about applications for deep learning and how you can begin to explore this emerging technology to advance your business.