The most recent artificial intelligence (AI) model from OpenAI, GPT-4o, was released on Monday and promises enhancements in text, visual, and audio capabilities.
Chief Technology Officer Mira Murati of OpenAI described the model as a “huge step forward with the ease of use” of the system during a live presentation on Monday. The most recent model from OpenAI was released just one day prior to Tuesday’s annual developer conference hosted by Google.
Here’s what you should know about GPT-4o’s launch
Enhanced Voice Guidance
According to OpenAI, users can now interact with the model about the submitted image and show GPT-4o numerous photos.
This can assist kids in solving arithmetic problems step-by-step. During the unveiling on Monday, one of the demonstrations led visitors through a basic math problem without providing an answer.
The new model can assist in real-time student training, as seen in a different video released by the online education organization Khan Academy. As the model walked him through the challenge in real time, the student shared his screen with him.
A More Rapid Variant With Enhanced Features
Murati stated on Monday that GPT-4o offers “GPT-4 level intelligence,” which is quicker and enhances the text, visual, and audio capabilities of the system.
“This is really shifting the paradigm into the future of collaboration, where this interaction becomes much more natural and far, far easier,” she said.
With an average response time of 320 milliseconds, OpenAI’s new model is stated to be able to “respond to audio inputs in as little as 232 milliseconds.” It was observed that this corresponds roughly to how long it takes for a human to reply in a discussion.
The New Model Was Released On Monday
GPT-4o will be accessible to all ChatGPT AI users, even those utilizing the free version, starting on Monday.
“GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits,” OpenAI wrote in its update Monday.
OpenAI CEO Sam Altman said on the social media site X that ChatGPT Plus customers will be able to access the new voice mode in the coming weeks.
“Natively Multimodal” Describes The Model
Additionally, Altman stated on X that the model is “natively multimodal,” meaning it can produce material and comprehend instructions that are given through text, voice, or image.
He declared that the new speech and video mode “is the best computer interface” he had ever used in a different blog post.
“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change,” he stated in his essay on Monday.