AI models have the potential for huge carbon footprints. To make them energy efficient, we need to tune innovations in data usage, hardware and software, argues Imec’s Axel Nackaerts.
Artificial intelligence and machine learning algorithms are becoming mainstream in multiple domains, from industrial diagnostics, healthcare, city management to personal voice assistants and robot vacuum cleaners. As their user base grows, so does the total energy consumption from the creation, training and execution of these algorithms. In particular, the recent introduction of gigantic models, consisting of hundreds of billions of parameters, raised questions on their environmental impact.
Towards energy-efficient AI: evolution or revolution?
Axel Nackaerts is the opening speaker of the Bits&Chips Machine Learning Conference 2021. In his talk, he will place the energy needs of AI in a global context, show how its historical development influenced energy consumption and point to several hardware and software techniques that improve the situation.More information
To get a better understanding of the total footprint of AI models, we should consider two factors: training and inference. First, an AI model needs to be trained by a labeled dataset. The ever-growing trend towards the use of bigger datasets for this training phase causes an explosive growth in energy consumption. Researchers from the University of Massachusetts calculated that during the training of a model for natural language processing, 284 metric tons of carbon dioxide is emitted. This is equivalent to the emission of five cars during their entire lifespan, including construction. Some AI models developed by tech giants – not reported on in scientific literature – might be even orders of magnitude bigger.
The training phase is just the beginning of the AI model’s lifecycle. Once the model is trained, it’s ready for the real world: finding meaningful patterns in new data. This process, called inference, might consume yet more energy. Unlike training, inference is not a one-off. It’s taking place continuously. For example, every time a voice assistant is asked a question and generates an answer, extra carbon dioxide is released. After about a million inference events, the impact will surpass that of the training phase. This process is unsustainable.
Today, both training and inference are typically performed in datacenters. Beyond the energy involved in the calculations, we should consider the transmission energy of sending data from the device to the datacenter and back. We could avoid part of that traffic by porting the inference process to the device where the data is captured. Besides saving a lot of energy, we could also save time. Latency is especially a key concern for image classification in self-driving cars, where immediate decisions are of vital importance. Decentralizing the data processing would also be a good idea in terms of privacy and security: if your personal information never leaves your phone, it can’t be intercepted. By bringing the intelligence to the data collecting device we even are no longer dependent on an internet connection.
So, what’s keeping us from implementing this on-device inference? Well, the inference processors today are just too power hungry for use in battery-powered edge devices. The hardware was designed for performance and precision instead of energy-efficiency.
The silver lining: research in developing completely new hardware architectures, aimed at drastically increasing energy efficiency, is picking up very quickly. A radical departure from conventional digital calculations promises several orders of magnitude improvement. Pathfinding is done in new compute-in-memory architectures, exploiting the most advanced logic and memory components.
Recently, Imec demonstrated an analog inference accelerator, achieving 2,900 trillion operations per Joule – which is already ten times more energy efficient than today’s digital accelerators. With these types of hardware innovations, it will become possible to directly process data in battery-powered devices – including drones and vehicles – to avoid transmission energy. However, developing energy-efficient hardware is only one side of the solution.
Running inefficient algorithms on an energy-efficient accelerator will wipe out the hardware’s benefits. Therefore, we also need to develop energy-efficient software techniques. This isn’t only necessary for on-device inference, but also to reduce the number of calculations during inference or training of algorithms in datacenters.
To reduce the energy consumption during the training phase, we can draw inspiration from our own nature: if you’re a good tennis player, learning how to play squash is only a small step. Similarly, with a technique called transfer learning, we can transfer an existing AI model trained in one domain to an adjacent one.
After training, we can further minimize the number of calculations by applying compression strategies. The most appealing one is the technique of network pruning: we ‘prune out’ all the parameters that have little importance for the end result. What remains is an algorithm that has the same functionality but is smaller, faster and more energy efficient. With the help of this compression strategy, the number of calculations can already be reduced by 30-50 percent. Thereafter, more application-based techniques will help us to further improve efficiency. By applying these techniques, we can regain more than 90 percent of power, apart from any hardware considerations.
Today, these energy-efficient software techniques are already applied in a hardware-agnostic way. The next step is to further improve the efficiency of the algorithms by adapting them to the specificity of the hardware. For example, if a memory block corresponds with a 16×16 matrix, the matrices in the neural network should optimally have the same dimension. If this matrix dimension is 20, retrieving the residual data elements from the memory storage will cause an overhead and, ultimately, extra energy consumption. The main lesson is that, when designing energy-efficient machine learning algorithms, we better take into account the structure of the hardware.
Conversely, when developing a new hardware architecture, we better consider the type of calculations we need to perform. Neural networks often depend on large vector matrix multiplications and analog accelerators are well suited for this task. New hybrid chip designs, combining analog accelerators and conventional digital accelerators, will allow us to automatically perform particular operations on the particular processor that’s most suitable for the task.
The co-optimization of innovations in data usage, hardware and software will bear the biggest benefit in energy consumption. To create truly energy-efficient AI systems, we thus need an integrated approach that tunes these innovations.