This past week has been another exciting one for the field of Artificial Intelligence, showcasing leaps in technology that continue to push the boundaries of what's possible. Here are 10 AI breakthroughs that you can't afford to miss 🧵👇
App Store for GPTs Coming Next Week
OpenAI had launched GPTs on its Dev Day where individuals could create their own custom GPTs on their data. Soon, the internet was flooded with 1000s of GPTs built by people for different use cases.
OpenAI has announced that they are launching an app store for GPTs which will let creators launch their GPTs and earn real money. Still unclear however is whether the GPT Store will launch with a revenue-sharing scheme of any sort.
Low-Cost Robot for doing Home Chores
The challenge of integrating mobility with dexterous manipulation in robotics has been a barrier to the practical application of robots in real-world settings. Stanford University's Mobile ALOHA combines a mobile base and a sophisticated control interface, enabling the robot to efficiently handle real-world tasks like cooking and cleaning. Mobile ALOHA has achieved up to a 90% increase in success rates for tasks such as sautéing shrimp, leveraging a blend of imitation learning and data co-training.
LG's New AI Home Helper
Bringing AI technology right into the heart of our homes, LG Electronics introduces its new smart home AI agent at CES 2024. This AI agent is a big step towards LG's vision of a "Zero Labor Home," where technology takes over routine tasks, making home management simpler and more efficient. The ‘two-legged’ wheel robot can move autonomously around the home and can even express a range of emotions, making it a practical yet futuristic companion to everyday living spaces.
DocLLM: LLM That Understands Layouts
In enterprise documentation, forms, invoices, receipts, and contracts are more than just text; they are a blend of text and visual layout, each playing a crucial role in conveying information. Understanding these complex documents requires a nuanced approach that considers both their textual content and spatial arrangement which traditional LLMs lack. Enter DocLLM by JP Morgan AI team, to adeptly navigate this multimodal space. DocLLM bypasses the need for costly image encoders and offers a more efficient and focused method for document analysis.
Improve Text Embedding with Simple Prompts
Microsoft researchers have developed a method for generating high-quality text embeddings, utilizing less than 1,000 training steps and solely synthetic data. This approach, distinct from traditional methods, leverages GPT 3.5 Turbo and GPT 4, bypassing the need for multi-stage training and large, manually curated datasets. Notably, the method has demonstrated its effectiveness by setting new benchmarks in language coverage and task diversity, covering nearly 100 languages and hundreds of thousands of embedding tasks.
Prompt Engineering to Boost LLM Responses by 50%
Prompt engineering is a skill often overlooked, yet it holds the key to unlocking the true potential of LLMs. Recognizing this, a recent study has introduced a comprehensive set of 26 guiding principles specifically designed to enhance the art of prompting LLMs. And if you thought that these are just theoretical principles, the application results in an average of 50% improvement in response quality across different LLMs.
Spatial Data and Temporal Clues for Better Videos
While diffusion models have significantly advanced image-to-image (I2I) synthesis, extending these improvements to video has been challenging due to the difficulty in maintaining consistency across video frames. FlowVid addresses this by intelligently combining spatial data and temporal optical flow clues from the source video. Unlike previous methods that solely relied on optical flow, FlowVid skillfully manages its imperfections.
Clone Any Voice with Just a Short Audio Clip
OpenVoice is a new dimension in voice replication capabilities with minimal input requirements. This technology stands out for its ability to clone a speaker's voice from just a short audio clip, and extends to multiple languages. It allows you to intricately control various aspects of voice style, such as emotion, accent, rhythm, pauses, and intonation, a capability that marks a significant advance over previous methods limited to tone color replication.
High-Quality 3D Rendering to Your Laptop
The challenge of rendering large-scale 3D scenes in real-time on web platforms, especially with limited computing resources like those in laptops, has long been a significant hurdle in computer graphics. City-on-Web is a new system capable of efficiently rendering these complex scenes in web browsers using standard laptop GPUs. Showcasing its capabilities, the system was tested on an NVIDIA RTX 3060 Laptop GPU and achieved a notable performance of 32 FPS at 1080p.
Tiny Model Trained on 3T Tokens in 90 Days 🫰
A 1.1 billion parameter model trained in just 90 days! TinyLlama project, an open endeavor to pre-train a 1.1B Llama model on a staggering 3 trillion tokens, began its training phase on September 1, 2023, and has successfully completed its ambitious phase. This accomplishment was achieved using an array of 16 A100-40G GPUs, demonstrating a notable blend of efficiency and power in AI model training.
IMPORTANT 👇👇👇
Stay tuned for another week of innovation and discovery as AI continues to evolve at a staggering pace. Don’t miss out on the developments – join us next week for more insights into the AI revolution!
Click on the subscribe button and be part of the future, today!
📣 Spread the Word: Think your friends and colleagues should be in the know? Click the 'Share' button and let them join this exciting adventure into the world of AI. Sharing knowledge is the first step towards innovation!
🔗 Stay Connected: Follow us for updates, sneak peeks, and more. Your journey into the future of AI starts here!
Shubham Saboo - Twitter | LinkedIn ⎸ Unwind AI - Twitter | LinkedIn
> Prompt Engineering to Boost LLM Responses by 50%
I read a couple of times, the principles are weird in a way that they contradict each other. Like no 1 and the example of no 26. It seems like the author not applying what their principles :D
Thoughts?
Great summary!
I am curious, what do you use to create mind map?