The generative AI space has seen a lot of progress in recent times. AI models are getting better and better at tasks like text summarization, answering questions, chatting, etc. For example, Bing Copilot has seen several improvements by taking advantage of GPT-4 technology. Google also announced its Gemini and Bard models. However, training and fine-tuning such models require massive computing infrastructure and cost a lot. This is a huge barrier to AI adoption because there aren’t many players in the market who can design such large language models from scratch. However, building models or applications on top of an already existing core or base model is something that could solve this problem. This helps companies because they don’t have to come up with a core model themselves, they can use an existing one and fine-tune it to their needs or use the model directly.
MaaS and MaaP
Model as a Service (MaaS) refers to a cloud-based service where such machine learning or generative AI models are hosted in the cloud and are readily available for use through simple chat-based APIs. Ease of use and smaller learning curves to try out these services have accelerated their adoption. Overall, MaaS simplifies model consumption.
Model as a Platform (MaaP) differs from MaaS, where model providers gain access to the underlying infrastructure provided by the cloud service provider rather than directly providing access to their model. In such cases, this may mean that the model vendor takes care of building, deploying and managing their machine learning applications by leveraging cloud infrastructure. MaaP enables organizations to create end-to-end ML solutions.
MaaS building blocks
Three parties are involved in this process, namely the model provider, the model publisher and the model consumer. A model provider is usually the one who creates the model, and these can be open source or closed source models—eg. Open AI, Hugging Face, etc. A model publisher can be a cloud provider that accepts this model from the model provider and makes it available for use by consumers—eg. Amazon, Microsoft etc. Model consumers consume the available models published by the model publisher. For example chat apps, bots, etc.
In order for a model provider to publish their model for consumption as a service, depending on the cloud provider they choose, several basic steps may be involved, as outlined below:
- Register of models: Model providers may want to use their registry service to provide all metadata associated with the model, such as weights, parameters, bin files, safe tensors, etc.
- Model catalog: A repository of available models for consumption. A base model can be exposed directly or a model built on top of the base model.
- Endpoints of the model: Model vendors may want to specify the computer they wish to provision as part of their model deployment, and a cloud service may expose an endpoint that consumers can use to access the model.
MaaS interface
Typically, MaaS offerings are chat-based interfaces that generate text when prompted for input. Sampling parameters such as temperature, replay penalty, max k, max tokens, etc. can also be sent in the request. Below is an example of a hello world request sent to the Facebook/opt-125m model located on localhost as a service. In the request, we send several sampling parameters that the model accepts. As we can see, the model responds with generated text as output.
curl http://0.0.0.0:5001/generate -H "Content-Type: application/json" -d '
"model": "facebook/opt-125m",
"prompt":"Hibiscus is a beautiful",
"max_tokens":20,
"temperature":0.8,
"top_p":0.95
'
"text":["Hibiscus is a beautiful plant. It will grow and live for years to come."]
Conclusion
In conclusion, Model as a Service (MaaS) offers a convenient and efficient way to leverage pre-trained machine learning models for specific tasks without incurring additional model development costs. It allows organizations to focus on their core applications instead of building models themselves. Google, Microsoft, and Amazon are notable cloud providers that offer MaaS, and the list of supported models is expected to grow as new model providers emerge. The cloud infrastructure behind the scenes should also scale well to support these models.