Opensource Alternative to OpenAI GPTs
PLUS: Qwen Outperforms GPT-3.5 & Claude, Text-to-3D in seconds
Today’s top AI Highlights:
The New Qwen Model Outperforms GPT 3.5 and Claude on MT-Bench and Alpaca-Eval
A 1.4x Speed Boost in Problem-Solving Through Chain-ofAbsrtacting
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Hugging Face releases Hugging Chat Assistant to build custom GPTs
Text-to-3D with Unmatched Geometry, Texture, and Style within seconds
& so much more!
Read time: 3 mins
Latest Developments 🌍
“Good” AI Model Across Six Sizes 🔢
Qwen models have received a significant update aimed at enhancing both the developer experience and the model's alignment with human preferences. This iteration includes a range of base and chat models across six sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B, along with quantized models, all supporting an impressive context length of 32k tokens. Particularly notable is the performance, where Qwen1.5-72B notably outperforms renowned models such as GPT-3.5 and Claude in several benchmarks, trailing right behind GPT-4.
Key Highlights:
Qwen1.5 demonstrates exceptional multilingual capabilities, evaluated across a diverse set of 12 languages covering exams, understanding, translation, and math. This extensive testing showcases the model's ability to comprehend and generate high-quality content in languages ranging from Arabic to Japanese and Spanish, highlighting its potential for global applications.
The model's capability to integrate with external systems has been rigorously tested through benchmarks like Retrieval-Augmented Generation (RAG) and agent performance on T-Eval. Qwen1.5-72B has shown competitive abilities in handling tasks that require integration with external knowledge sources and tools, indicating its potential as a robust AI agent.
In detailed performance evaluations, Qwen1.5 models have shown strong results across a variety of benchmarks. Notably, the 72B model outperforms previous generation models like Llama2-70B across all traditional benchmarks and surpasses Claude-2.1, GPT-3.5-Turbo, Mixtral-8x7b-instruct on both MT-Bench and Alpaca-Eval v2.
Efficient Tool Use with Chain-of-Abstraction Reasoning ⛓️
The efficiency of LLMs in multi-step reasoning has been a pressing challenge, particularly when it comes to seamlessly incorporating external tools for accurate, real-world knowledge application. Addressing this, researchers have introduced a novel method named Chain-of-Abstraction (CoA) reasoning, designed to enhance LLMs' capability to utilize these tools through abstract reasoning chains. This method not only promises to streamline the reasoning process by allowing parallel decoding and tool calls—reducing inference times by an average of ~1.4 times compared to traditional tool-augmented LLMs—but also improves reasoning accuracy significantly.
Key Highlights:
The CoA method trains LLMs to generate abstract reasoning chains with placeholders which are later filled with specific information from domain tools, fostering more general and adaptable reasoning strategies. Such a strategy has demonstrated an impressive ~6% absolute improvement in QA accuracy across different reasoning tasks.
When applied to mathematical reasoning and Wiki QA domains, the CoA method notably enhanced performance, yielding average absolute accuracy improvements of 7.5% and 4.5%, respectively. These gains were consistent across both in-distribution and out-of-distribution test sets. Moreover, models trained with CoA showcased significantly faster inference speeds, being ~1.47 times and 1.33 times quicker on mathematical and Wiki QA tasks, respectively.
Tested through extensive human evaluations and comparisons against existing baselines, the approach led to an approximate 8% reduction in reasoning errors, underscoring its potential to enhance the accuracy of LLMs in complex reasoning scenarios.
LLMs to Master Non-Verbal Speech and Sounds 🔊
Understanding and interpreting audio inputs remains a significant challenge for LLMs, particularly when it comes to non-speech sounds and non-verbal speech cues. Audio Flamingo is a novel audio language model that enhances LLMs' capabilities to comprehend audio. This model not only exhibits a strong understanding of various audio inputs but also excels in adapting to new tasks through few-shot learning and engaging in multi-turn dialogues.
Key Highlights:
Audio Flamingo introduces an audio feature extractor for better temporal capture in variable-length audio and excels in few-shot learning by adapting to new tasks with in-context learning and retrieval, without needing task-specific fine-tuning. This makes it highly effective in understanding a wide array of audio inputs, including complex non-speech sounds.
The model processes a diverse dataset of around 5.9 million audio-text pairs with a two-stage training method (pre-training and supervised fine-tuning) and employs cross-attention mechanisms for audio input integration. This approach ensures efficient complexity management and superior generalization across various audio challenges.
By developing and refining on specialized dialogue datasets, Audio Flamingo showcases significant enhancements in multi-turn dialogue interactions. This improvement marks a step forward in facilitating more nuanced and prolonged conversations between AI and users, making it suitable for applications requiring detailed and ongoing communication.
Tools of the Trade ⚒️
Meshy-2: Text-to-3D gets substantial improvements in geometry, texture, and style, allowing users to create well-structured meshes with rich geometric details and enhanced texture quality. It offers four distinct styles—Realistic, Cartoon, Low poly, and Voxel, with previews available in 20 seconds and final results in under 5 minutes.
Hugging Chat Assistant: A free open-source alternative to OpenAI GPT store, by Hugging Face to create customizable AI assistants using open-source models like Llama 2 and Mixtral. You can create these assistants in just 2 clicks, choose from various open-source models, and customize their behavior with system prompts. It currently does not support features like RAG, web search, and image generation.
LLM Automator: A macOS tool to create your own keyboard shortcuts for generating text using various language models, including OpenAI GPT, Ollama, and HuggingFace, by simply selecting text in any application and pressing a customizable shortcut key.
Zenfetch: A browser extension that transforms your web browsing including articles, PDFs, and YouTube videos into a personal AI assistant, allowing you to easily capture and recall online information, make instant summaries, and enhance your knowledge and content creation.
😍 Enjoying so far, TWEET NOW to share with your friends!
Hot Takes 🔥
I have a feeling that a well-finetuned Qwen 1.5 might reach the GPT-4 level. ~ Andriy Burkov
It's still super early days for AI. A lot is left to be invented.
- multimodal LLMs are super crappy. They can get 10x better
- image generation doesn't really work, especially if you want a very specific image
- insane amounts of opportunity with robots
- we are at least 3-4 years away from automating professions in a meaningful manner ~ Bindu Reddyone exciting observation about transformers (and most modern deep learning) is that you can understand them using high school math. really just multiplication, division, sums, and exponentiation, many times, and in a strange and initially hard-to-grok order ~ jack morris
Meme of the Day 🤡
Android users through Apple Vision Pro
That’s all for today!
See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇
Real-time AI Updates 🚨
⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!
PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!