Siri will get smarter, maybe soon she will understand how iPhone apps work

New Delhi,UPDATED: April 10, 2024 at 10:49 AM IST

If you’re a kid from the 90s, chances are the first AI (artificial intelligence) tool you encountered was Siri. The AI-powered voice assistant was introduced as part of the iPhone 4S features in 2011. Whether it’s helping us answer a call or setting an alarm, Siri made our lives easier and was quite fun to interact with. However, we haven’t really seen any major Siri-related announcements in the past few years. Now that artificial intelligence is in the spotlight since the launch of OpenAI’s chatbot ChatGPT, it is reported that Siri may also become smarter in the future.

Reports of Apple working on generative AI features for Siri have been circulating for some time. Now, a research paper published by Cornell University talks about a new MLLM (Multimodal Large Language Model) that could understand how the phone interface works. The paper, titled Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs, explains how the technology has come a long way, but still has shortcomings when it comes to interacting with the screen’s user interface.

However, Ferret UI (which launched last October) is an MLLM that is being developed to understand UI screens and potentially understand how apps work on a phone. MLLM, according to the document, also wants to have “referral, grounding and reasoning capabilities.”

One of the primary challenges in improving AI’s understanding of app screens lies in the varying aspect ratios and compact visuals found in smartphone screens. Ferret-UI solves this obstacle by increasing detail and taking advantage of enhanced visual features to understand even the smallest icons and buttons. The paper also mentions that through detailed training, Ferret-UI outperformed existing models in its ability to understand and interact with application interfaces. If Ferret-UI is incorporated into Apple’s Siri voice assistant, we can expect it to make the tool smarter.

In the future, the digital assistant could perform complex tasks within applications. Imagine asking Siri to book a flight or make a reservation, and Siri seamlessly interacts with the appropriate app to fulfill the request.

Speaking of Ferret, it’s an open-source multimodal large-scale language model published by Apple and Cornell University, which is the result of extensive research into how large-scale language models can recognize and understand elements within images. This means that the user interface with Ferret underneath can handle queries like those for ChatGPT or Gemini. Ferret was launched for research purposes in October last year.

Posted by:

Divyanshi Sharma

Posted on:

April 10, 2024

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *