IN In late 2022, ChatGPT had its “iPhone moment” and quickly became the poster boy for the Gen AI movement after going viral within days of its release. For the next wave of LLMs, many technologists are looking at the next big opportunity: going small and hyper-local.
The underlying factors driving this next big change are familiar: better user experience linked to our expectations of instant gratification, and more privacy and security built into user queries within smaller local area networks such as devices we hold in our hands or inside our cars and homes without the need for a roundtrip trips to cloud data server farms and back, with inevitable latency that increases over time.
As long as it exists some doubts on how quickly local LLMs can reach GPT-4 capabilities such as its 1.8 trillion parameters across 120 layers running on a cluster of 128 GPUs, some of the world’s best-known technology innovators are working to bring artificial intelligence “to edge” so that new services such as faster, intelligent voice assistants, localized computer images for rapid production of image and video effects and other types of user applications.
For example, Meta and Qualcomm announced in July that they were teaming up to run large AI models on smartphones. The goal is to enable Meta’s new major language model, Llama 2, to run on Qualcomm chips in phones and PCs starting in 2024. This promises new LLMs that can avoid cloud data centers and their massive data processing and computing power that is and expensive and becoming a sustainability eyesore for big tech companies as one of the “dirty little secrets” of the nascent AI industry amid concerns over climate change and other natural resources needed like cooling water.
Challenges of Gen AI working at the edge
Like the path we’ve seen over the years with many types of consumer tech devices, we’ll almost certainly see more powerful processors and memory chips with smaller footprints powered by innovators like Qualcomm. Hardware will continue to evolve following Moore’s Law. But on the software side, there’s been a lot of research, development, and progress in how we can miniaturize and shrink neural networks to fit on smaller devices like smartphones, tablets, and computers.
Neural networks they are quite large and heavy. They consume huge amounts of memory and need a lot of processing power to execute because they consist of many equations involving multiplication of matrices and vectors that stretch mathematically, similar in some ways to the way the human brain is designed to think, imagine, dream, and create.
There are two widely used approaches to reduce the memory and processing power required to implement neural networks on edge devices: quantization and vectorization:
Quantization means converting floating point arithmetic to fixed point arithmetic, which is more or less like simplifying the calculations done. If in floating point you perform calculations with decimal numbers, with fixed point you do it with whole numbers. Using these options allows neural networks to take up less memory, since floating-point numbers take up four bytes and fixed-point numbers generally take up two or even one byte.
Vectorization, in turn, intends to use special instructions of the processor to perform one operation on several data at once (using a single instruction of several data – SIMD – instructions). This speeds up mathematical operations performed by neural networks, as it allows addition and multiplication with several pairs of numbers at the same time.
Other approaches that are increasingly used to run neural networks on edge devices include the use of Tensor Processor Units (TPU) and Digital Signal Processors (DSP), which are processors specialized in matrix operations and signal processing; and the use of truncation and low-rank factorization techniques, which involves analyzing and removing parts of the network that do not make a relevant difference to the result.
It is therefore possible to see that techniques to shrink and speed up neural networks could enable Gen AI to run on edge devices in the near future.
Killer apps that could be released soon
Smarter automation
By combining Gen AI working locally – on devices or within networks in the home, office or car – with various IoT sensors connected to them, it will be possible to perform data fusion at the edge. For example, there could be smart sensors paired with devices that can listen and understand what’s going on in your environment, triggering context awareness and allowing intelligent actions to happen on their own – like automatically turning off background music during incoming calls. turning on the air conditioning or heating if it gets too hot or cold and other automations that can happen without the user programming them.
Public safety
From a public safety perspective, there is a lot of potential to improve what we have today by connecting more and more sensors in our cars to sensors on the streets so they can intelligently communicate and communicate with us on local area networks connected to our devices.
For example, for an ambulance trying to get to a hospital with a patient who needs emergency care to survive, a connected intelligent network of devices and sensors could automate traffic lights and warnings in the car to make way for the ambulance to arrive on time. This kind of connected, smart system could be used to “see” and alert people if they are too close to each other in the event of a pandemic like COVID-19, or to understand suspicious activity caught on network cameras and alert the police.
Telehealth
Using an LLM-enhanced Apple Watch model that could monitor and provide initial advice for health issues, smart sensors with Gen AI on the edge could make it easier to identify potential health issues – from unusual heartbeats, elevated temperatures, or sudden falls to, but not limited to, immobility. Paired with video surveillance for those who are elderly or ill at home, Gen AI on the edge could be used to send emergency alerts to family members and doctors, or provide healthcare reminders to patients.
Live events + smart navigation
IoT networks paired with Gen AI at the edge have great potential to improve the experience at live events such as concerts and sports in large venues and stadiums. For those without floor seats, the combination could allow them to select a specific angle by tapping on a networked camera to watch along with the event live from a specific angle and location, or even re-watch the moment or play instantly as you can today with a recording device like to the TiVo that is paired with your TV.
That same networked intelligence in the palm of your hand could help navigate large spaces—from stadiums to shopping malls—to help visitors find where a particular service or product is available in that location by simply searching.
Although these new innovations are at least a few years away, there is a big shift ahead for valuable new services that can be introduced once the technical challenges of reducing LLMs for use on local devices and networks are solved. Based on the added speed and increase in user experience, and the reduced privacy and security concerns of keeping everything local versus the cloud, there’s a lot to love.