Google's Gemini 1.5 Pro, OpenAI GPT-4 Turbo API and Mistral MoE 8x22B 💥

PLUS: Google's NoCode AI Agent platform, AI video creation platform and Imagen 2.0 for live images, Opensource takes a lead with Mistral MoE, Gemma and Llama-3

Shubham Saboo

Apr 10, 2024

Today’s top AI Highlights:

Google’s Gemini 1.5 Pro can now take audio inputs
New Gemma models for coding and research
Generate short, 4-second live images from text with Imagen-2.0
OpenAI integrates vision capability in GPT-4 Turbo API
Meta confirms Llama 3 models are coming this month
Mistral AI launches a new Mixture-of-Experts model

& so much more!

Read time: 5 mins

Latest Developments 🌍

Google Rolls out Gemini 1.5 Everywhere! 🌏

Gemini 1.5 Pro can now listen: Gemini 1.5 Pro can now process audio inputs like lecture or earnings calls and carry out various tasks, available in both Gemini API and Google AI Studio.

Updates to Gemini API:

System instructions: Guide the model’s responses for your use case. Define roles, formats, goals, and rules to steer its behavior.
JSON mode: Instruct the model to only output JSON objects. This mode enables structured data extraction from text or images.
Improvements to function calling: Choose text, function call, or just the function itself to limit the model’s outputs.

More models in Gemma suite of models:

CodeGemma models: Powerful yet lightweight coding models, 7B variant for code completion and code generation tasks, 7B instruction-tuned for code chat and instruction-following, and 2B variant for fast code completion that fits on your local computer.

image of streamlined workflows within an exisitng AI dev project with CodeGemma integrated

RecurrentGemma: An efficiency-optimized architecture for research experimentation. It leverages recurrent neural networks and local attention to improve memory efficiency. Great for researchers who want to create or study large chunks of text efficiently.
Gemma 1.1 Models: Built over Gemma 7B and 2B models, substantial gains on quality, coding capabilities, factuality, instruction following and multi-turn conversation quality.

New Imagen 2.0: Google’s text-to-image model can now generate short, 4-second live images from text prompts along with image editing capabilities like inpainting/ outpainting and digital watermarking.

New features in Google Workspace:

Google Vids: AI-powered video creation app for storytelling in work. It lets you generate storyboards, edit them, and assemble videos with stock footage, images, background music, and voiceovers.
Voice prompting in Gmail: Send emails easily when you’re on the go with voice input in Help me write, and convert rough notes to a complete email with one-click.
AI in Google Meet: Take notes for me lets AI take notes for you. Translate for me (coming in June) will automatically detect and translate captions in Meet.
AI Security Add-on: AI-powered threat defenses and the capability to classify and protect sensitive files in Google Drive.
Integrated Vertex AI: Build custom AI agents easily. Choose from 130 models available, and bring that AI agent into the productivity apps in Workspace.

Axion - Google’s first custom Arm-based CPUs designed for the data center:

Companies are preferring Arm-based CPUs as they are more energy efficiency and offer flexibility in addressing diverse computational workloads.
Axion processors deliver up to 30% better performance than the fastest general-purpose Arm-based instances available in the cloud today.
They offer up to 60% better energy efficiency than comparable current-generation x86-based instances.

Gemini goes deep in Google Cloud:

Gemini Code Assist: Helps developers build apps in code editors like VS Code and JetBrains. It is an enterprise-grade coding assistant that supports your private codebase wherever it lives — on-premises, Gitlab, Github, Bitbucket, or even across multiple repositories.
Gemini Cloud Assist: Helps cloud teams seamlessly design, deploy, and optimize their application lifecycles on Google Cloud. It also offers tailored AI guidance for architecture design, troubleshooting, and performance enhancements.
Gemini is also being integrated for Security Ops to power the entire investigation lifecycle, including conversational threat intelligence. Gemini in BigQuery will assist data professionals with AI-powered analytics. It is also coming in Databases to help teams manage, optimize and govern an entire fleet of databases from a single dashboard.

Vertex AI Agent Builder: Lets you easily build and deploy enterprise-ready conversational AI agents.

Includes a no-code console for building AI agents using Google’s Gemini models, advanced agent orchestration, and maintenance tools
Provides comprehensive tools for grounding model outputs in enterprise data through its out-of-the-box RAG system, vector search, and the option to integrate with Google Search.
Enhances functionality of these AI agents with extensions for connecting to specific APIs or tools, function calling, and data connectors for integrating with enterprise and third-party applications.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/new_name6.gif

GPT-4 Turbo’s API gets Vision Capability 👀

OpenAI has rolled out updates to GPT-4 Turbo model, now having vision capability, generally available via API. The new functionality supports JSON formatted requests and the use of function calling, making it easier for developers to automate actions into their applications, like changing content for a website, automate cataloging in libraries, or even executing an online purchase.

Llama 3 will be out this month 🤩

Meta has confirmed at an event that it plans to launch Llama 3 within the next month, marking the next phase in its opensource language foundation model development. Nick Clegg, Meta’s president of global affairs, said that Llama 3 suite of models will be a number of different models with different capabilities and versatilities which will be released during the course of this year, starting very soon.

It was also reported that the launch will start with Meta releasing two smaller versions of Llama 3 first, probably 7B and 13B sized, but Meta hasn’t confirmed this yet.

It is crucial for Meta to strategically plan its releases to get maximum attention and avoid what happened with Google, when it announced Gemini 1.5 which OpenAI’s Sora announcement overshadowed.

Our Opinion: We believe the biggest Llama 3 model will be multimodal and will be as good as GPT-4. Let us know what you think in the comments below! 👇

Intel Challenges NVIDIA’s Dominance with New AI Accelerator 💨

Intel has released the architectural details of its third-generation AI chip, Gaudi 3, at an event in Phoenix. This release comes amid the hot competition in AI accelerator chips, where Nvidia has been a dominant player. Gaudi 3, with its enhanced capabilities and efficiency, is Intel’s latest effort to outpace Nvidia’s dominance and capture more market.

Key Highlights:

Architecture: Designed for generative AI tasks, it features a compute engine with 64 customizable Tensor Processor Cores and 8 Matrix Multiplication Engines. Each MME can perform 64,000 parallel operations, optimizing complex matrix computations essential for deep learning.
Performance: It delivers 4x AI compute for BF16, 1.5x increase in memory bandwidth, and 2x networking bandwidth for massive system scale out compared to its predecesso.
AI Training and Inference: Compared to Nvidia’s H100, Gaudi-3 would take half the time to train Llama 2 7B and 13B models, and GPT-3 175B parameter model. Additionally, Intel Gaudi 3 accelerator inference throughput is projected to outperform the H100 by 50% and 30% faster inferencing.

Mistral AI quietly Drops a new MoE Model 🤖

Mistral AI has very casually released a brand new 8x22B Mixture of Experts model on Twitter. Here are the details we have:

65k context length
Uses 48 attention heads
Includes 8 expert models and allows 2 experts per token

😍 Enjoying so far, share it with your friends!

Share Unwind AI

Tools of the Trade ⚒️

Writio: Create your team of AI writers that can adopt your unique voice and writing style, and scale your content production while avoiding common pitfalls like unhelpful content and AI detection. You can give these AI writers a specific background and niche or teach your style, then give them multiple topics at once, and then relax!

Shortwave: The smartest AI-powered email app. It can create emails in your unique style, retrieve answers to your questions by searching your mail history, translate, schedule meetings, organize and group your emails, use delivery schedules to receive emails exactly when you want, and much more!
MathDash: Competitive math fosters critical thinking, problem-solving skills, and a deeper understanding of mathematical concepts, but is not accessible to everyone. MathDash offers a platform where you can engage in daily competitions and live matches across various difficulty levels, from basic arithmetic to complex theory, and enhance your math skills.

Hot Takes 🔥

One of the most exciting things about AI is that it enables people without deep expertise / experience in an area to quickly become extremely competent in that space.
New college / HS grads can tackle problems that would have taken a masters 5 years ago. ~
Logan Kilpatrick
The AGI race is more consequential than the Space Race. ~
Bindu Reddy

Meme of the Day 🤡

Is this real?! (or AI-generated)

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!