Deep Neural Networks and GPUs
A first online search for the topic "Artificial Intelligence" or "Deep Learning" reveals astonishing results. The term "Deep Learning" implies the use of Deep Neural Networks (DNN), which are used for most AI systems that enabled recent successes.
Here is a brief and certainly incomplete list of breakthroughs powered by DNN technology:
Image and character identification
The standard benchmark for character recognition is the MNIST dataset, comprised of 60000 handwritten digits 0 to 9, scanned to 28x28 pixel grayscale images. The best DNN topologies applied to MNIST achieve superhuman recognition rates. The CIFAR-10 image recognition benchmark lists two networks which also are in the superhuman recognition rate domain.
In medical diagnostics, DNN achieve identification rates of cancerous tissue on par with the best experts in the field. This is the hallmark of a revolution in automated screening of medical imagery.
Autonomous vehicles are all the rage recently, heralding the era of driverless cars, aircraft and ships. Since it is impossible to preconceive all conditions an autonomous vehicle could encounter during daily operation, it eludes the classical programmatic approach. This has led to the introduction of DNN into this topic. Among other competitors, the Nvidia BB8 car is powered by a DNN, which is trained in typical road situations. It makes efficient use of the sensors built into the car and has an impressive driving record.
Artificial neural networks are inspired by neurons and synapses of biological neural tissue and the way they interconnect. In contemporary literature, the term "artificial" is often omitted. These networks employ a high level of parallelization. They consist of layered stacks of neurons, where each interior layer is connected with preceding and succeeding layers. In a DNN topology, the number of such layers can be in the range of 10 to over 100. The synapses of biological neurons are represented by single scalar valued weights, which is a maximally simplified version of the original. The training effort aims at creating a weight matrix for the entire network, which minimizes the error on training datasets and live data. A biologically realistic simulation, like in the Hodgkin-Huxley model, would require a computational power which far exceeds current and near-future hardware performance.
In a DNN, the training is done by using the backpropagation algorithm, which was the pivotal breakthrough that made DNN practicable for real world applications.
The training effort is very compute intensive, with an emphasis on floating point performance. It requires a large number of epochs, which are iterative runs over the complete dataset to be trained. A reasonable training run of a DNN can take hundreds of epochs, in order to reduce the error to a preset value. A typical dual CPU server system with 16 to 44 cores is insufficient for DNN training on large datasets, like high resolution images taken by an autonomous vehicle.
This limitation led to the extension of DNN codes to incorporate the Nvidia CUDA software stack, which maps linear algebra operations to GPUs. The correct term is GPGPU, which stands for "General Purpose computation on Graphics Processing Units", which is usually reduced to GPU.
The CUDA stack had been extended by dedicated libraries for DNN operations in recent months. The parallel topology of DNN allow an efficient use of a GPU's processing units. The speedup over a typical dual CPU configuration can be in the range of 10 to 50. State of the art dual CPU servers have four to eight GPU cards, like the Nvidia Pascal based Tesla or GeForce versions. This amounts to a sustained performance of about 100 TFlops single precision (32bit) in a single system.
The dominance of Nvidia in GPU computing will soon be challenged by AMD's announcement of the "Instinct" series for GPUs tailor made for DNN workloads. These GPUs will become available in 2017.
Intel's next generation Xeon Phi CPU called "Knights Landing" has 76 "Silvermont" Atom cores on die and is available as a socket version for use on a suitable mainboard, which makes a Xeon CPU unnecessary. The advantage of this version is that it can access the system's DIMM sockets, so the limited amount of memory on an add-on card is no longer an issue. This is a benefit for large datasets, which do not fit entirely into the 12GB to at most 24GB of a PCIe GPU or Xeon Phi.
Another development to follow is the use of FPGAs (Field Programmable Gate Array) for DNN. These devices can be configured by software. They allow the use of topologies, which are custom designed for a given DNN task. Intel's acquisition of FPGA manufacturer Altera has borne fruit in Xeon E5 v4 CPUs equipped with an FPGA in a multichip package.
The big cloud service providers like Google or Microsoft are developing their own DNN accelerators based on FPGAs or special ASICs, like Google's TPU, the "Tensor Processing Unit". Such specialized devices offer the best possible Performance/Watt ratio, which is a key element for large server farms to optimize. On the other end of the scale, FPGAs are used for embedded systems with DNN workloads, to maximize battery lifetime. The IoT business is profiting from this particular development, as DNN can be used in endpoint devices along with sensor circuits. This allows data processing immediately after the sensor layer.
All these trends will become more dominant in 2017, due to the continuing growth of DNN technology.
There is an ever increasing demand of computational power in scientific or industrial research. At the same time, high-performance computing systems consume very much power.
The topic of machine learning covers a wide range of different technologies with their roots in statistics, neurobiology and IT. A very exciting field for machine learning lies in neural networks which are based on the biological structure model.