GPT-4 Omni: The future of Multimodal AI

PLUS: OpenAI released ChatGPT for Video chat, a human-like voice AI assistant and Desktop App for ChatGPT

Shubham Saboo

and

Gargi Gupta

May 14, 2024

OpenAI releases its new flagship model: GPT-4O
OpenAI’s new Voice Assistant is fast and sounds super human-like
GPT-4O is also available to free users along with other features
Google also plans to launch Voice Assistant that can process videos also
Google’s Project Starline to be commercially released in 2025

& so much more!

Read time: 3 mins

💡 Build RAG Agents for Internal data with Nocode

Join this FREE webinar to learn how to automate AI workflows, enhance data retrieval, and build efficient RAG agents for real-time decision-making.

Register now before the spots get filled!

Latest Developments 🌍

Everything Launched at OpenAI Spring Event 🌟

OpenAI just wrapped its Spring Event. While the hype was at its peak and the half an hour event might have not met some expectations, OpenAI did not reveal a lot of capabilities of the new model in the demos. Here’s EVERYTHING that you can do with it:

New GPT-4O Model: OpenAI has released a new GPT-4 Omnimodel
1. It brings GPT-4-level of intelligence to all but 4x faster. It is also available to free users. Paid users will be 5x higher rate limits.
2. It can process text, voice and images end-to-end and can improve on its capabilities. This also means it doesn’t need any other model to process and output these modalities. For instance, if you want to generate an image in ChatGPT, it won’t use DALL.E model, GPT-4O will do it itself.
3. It can chat in and translate 50 languages that cover 97% of the world’s population.
4. Do check-out their demos in the blog to really understand what this model is capable of.

GPT-4O API: GPT-4O is not just limited to ChatGPT, it’s also coming to the OpenAI API. It is 2x faster, 50% cheaper than GPT-4 Turbo and comes with 5x higher rate limits.

ChatGPT Desktop App: The new desktop app makes it easier to take ChatGPT’s assistance while working on something else. With a simple keyboard shortcut (Option + Space), you can instantly ask ChatGPT a question. It will be available soon.
New Voice Assistant: OpenAI has released a new Voice assistant that is much capable and faster than the existing Voice Mode in ChatGPT. The new Voice Assistant is powered by GPT-4O, which can natively handle text, audio and voice.
1. This is different from the current Voice Mode that uses three different models - a transcription model, GPT 3.5 or 4 for intelligence, and a text-to-speech model. Orchestrating these three models together brings a lot of latency.
2. Since GPT-4O powers the Voice Assistant single-handedly, it is near real-time without awkward pauses.
3. It can be interrupted to change the topic or question easily. It is capable of talking in a wide range of emotions and styles. Just tell it if you want to it to talk with more emotions, it will!
4. Video chat with Voice Assistant - It can see while giving voice assistance. You can turn on your camera and show it what you’re doing, and it will help you step-by-step to complete your work. This helps with so many usecases, be it solving math, or guiding in coding, helping in cooking, and more.

Since GPT-4O is available to free ChatGPT users also, they can now:
1. Upload images and files and ask questions about them. Analyze data and charts through advanced data analysis
2. Browse the web directly through ChatGPT and get the latest information
3. Use GPTs and discover them on the GPT Store
4. Use the “Memory” feature where ChatGPT remembers about your preferences that you tell it in chats.

Google is Releasing a New Voice Assistant Too 📲

We aren’t sure if Google was trying to overshadow OpenAI’s event today but it released a small demo video of its new Voice Assistant that has similar capabilities as that of OpenAI. The demo shows the Voice Assistant answering questions in real-time on a video call.

Though it feels a little underwhelming from OpenAI’s demo which is faster and more expressive, Google has an edge as its model can natively process videos while GPT-4O cant. Eagerly waiting for what Google has in store for tomorrow’s I/O event!

Also, Google is planning to bring Project Starline into the commercial market in 2025, in partnership with HP. Project Starline creates an incredibly lifelike video communication experience using AI and 3D imaging to create a “magic window” and make it feel like the participants are in the same room.

Testing has shown that this technology improves attentiveness, memory recall, and the overall sense of presence compared to traditional video calls. It will be integrated with existing video conferencing platforms like Meet and Zoom.

😍 Enjoying so far, share it with your friends!

Share Unwind AI

Tools of the Trade ⚒️

SheetMagic: Use AI and web scraping directly within Google Sheets. You can generate content and images with AI, scrape live data from websites for researching, and analyze and clean data, all without leaving your spreadsheet.

Cleaveer: Convert YouTube videos into different content formats like blog posts, Twitter threads, and LinkedIn posts. You can use these converted materials as-is or edit them further using Cleaveer’s built-in editor. It also creates text summaries of the videos to save you time and effort.
Bumpgen: Use AI to automatically fix code broken by dependency upgrades. Just provide the package name and new version, along with your OpenAI API key, and Bumpgen will generate the fixes. It supports Typescript and works with tools like Dependabot.
Awesome LLM Apps: Build awesome LLM apps using RAG for interacting with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple texts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes 🔥

I don't think many of the people working on AI in the major labs have thought enough about measurement of AI abilities, which is a prerequisite to understanding if AGI is possible, and when. The fact that everyone is using terrible benchmarks is indicative of the problem. ~
Ethan Mollick
Siri: "Designed in California. Made by OpenAI." ~
Santiago
AI is cool, but the real Internet still runs on clicks and eyeballs. ~
Bojan Tunguz

Meme of the Day 🤡

Memers prep before the event today!

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!