How hard will Nvidia battle adoption since it requires a competitive hardware redesign?
A hot potato: As more businesses jump on the AI bandwagon, the power consumption of AI models is becoming an urgent matter. While the most prominent players – Nvidia, Microsoft, and OpenAI – have downplayed the condition, one company claims it has come up with the answer.
Students at BitEnergy AI have developed a process that could dramatically reduce AI power consumption without sacrificing too much accuracy and speed. The study argues that the process could cut energy use by up to 95 percent. The group reaches the breakthrough Linear-Complexity Multiplication or L-Mul for short.
The computation technique makes use of integer addition, which requires much less power and fewer stages than the floating point increase for applications in AI.
Since floating-point numbers are heavily used in AI computations whenever the number is extraordinarily large or tiny, it is synonymous with a scientific reminder in binary format whereby heavy computations by AI computers can be well done. The correctness of this computation comes at various costs.
The energy requirements of the booming AI industry have reached a dismal level, with some models requiring huge amounts of electricity. For example, electricity consumed by ChatGPT is comparable to 18,000 US homes and 564 MWh daily. Judges at the Cambridge Centre for Alternative Finance estimate that the AI initiative could consume e e between 85 and 134 TWh yearly by 2027.
The L-Mul algorithm takes this absurd waste of energy by comparing – complex floating-point expansions with simpler integer proliferation. In testing, AI models maintained accuracy while reducing energy consumption by 95 percent for tensor multiplications and 80 percent for dot developments.
=> Related Stories
- Nvidia’s RTX 5090 power features two 16-pin power connectors, needing a new PSU
- Apple Intelligence beta offers the possibility for humor – and heartbreak – with unique text outlines
The L-Mul technique also produces a proportionally enhanced version. The algorithm exceeds current 8-bit computational standards, reaching higher precision with fewer bit-level measures. Tests covering various AI tasks, including crude language processing and machine vision, displayed only a 0.07-percent routine decrease – a small tradeoff when factored into the power savings.
L-Mul fits best the transformer-based models, such as GPT, and this algorithm naturally integrates well into the attention points of the most important yet highly energy-intensive components of the above systems. Indeed, experiments with the usage of widely known AI models, Llama and Mistral, have evidenced increased accuracy in some tasks. However, there is profitable news and bad news.
The bad news is that L-Mul currently requires technical hardware. Contemporary AI processing is not optimized to take benefit of the technique. The good news is plans for developing specialized hardware and programming APIs are in the works, paving the way for more energy-efficient AI within a suitable timeframe.
The only other obstacle would be characters, notably Nvidia, hampering adoption efforts, which is a genuine opportunity. The GPU manufacturer has made a stand for itself as the go-to hardware developer for AI applications. It is improbable it will pitch its hands up to better energy-efficient hardware when it holds the lion’s percentage of the market.
For those who live for complicated mathematical explanations, a preprint version of the study is published in Rutgers University’s “arXiv” library.