Cloudflare announced it AI workers is now generally available. Workers AI is a solution that allows developers to run machine learning models on the Cloudflare network.
The company says its goal is to make Workers AI the most affordable solution for powering inference. To make this happen, he’s made some optimizations since the beta, including a 7x reduction in the cost to run the Llama 2 and a 14x decrease in the cost to run the Mistral 7B model.
“The recent generative boom in artificial intelligence has seen companies across industries invest huge amounts of time and money in artificial intelligence. Some of it will work, but the real challenge of AI is that the demo is simple, but putting it into production is incredibly difficult,” said Matthew Prince, CEO and Co-Founder, Cloudflare. “We can solve this by abstracting away the cost and complexity of building AI-powered applications. Workers AI is one of the most affordable and accessible inference solutions.”
RELATED CONTENT: Cloudflare Announces GA Releases for D1, Hyperdrive and Workers Analytics Engine
It also made improvements to load balancing, so that requests are now routed to multiple cities and each city understands the total available capacity. This means that if a request needs to be queued, it can just redirect to another city instead. The company currently has inference GPUs in more than 150 cities worldwide and plans to add more in the coming months.
Cloudflare has also increased speed limits for all models. Most LLMs are now capped at 300 requests per minute, up from just 50 per minute during beta. Smaller models may have a limit of between 1500 and 3000 requests per minute.
The company also reworked the Workers AI dashboard and AI playground. The dashboard now displays usage analytics across models, and the AI playground allows developers to test and compare different models, as well as configure instructions and parameters, Cloudflare explained.
Cloudflare and Hugging Face have also expanded their partnership, and customers will be able to run models available on Hugging Face directly from Workers AI. The company currently offers 14 models of Hugging Face, and as part of the GA release, it has added four new models that are available: Mistral 7B v0.2, Hermes 2 Pro Nous Research, Google Gemma 7B, and Starling-LM-7B-beta.
“We’re excited to work with Cloudflare to make AI more accessible to developers,” said Julien Chaumond, co-founder and CTO, Hugging Face. Offering the most popular open models with a serverless API powered by a global fleet of GPUs is an amazing proposition for the Hugging Face community and I can’t wait to see what they do with it.”
Another new addition is Bring Your Own LoRAs, which allows developers to take a model and adjust only some of the model’s parameters, instead of all of them. According to Cloudflare, this feature will allow developers to get fine-tuned model results without going through the process of actually fine-tuning the model.