TikTok's Billion-Dollar AI Dance π―ββοΈ
PLUS: Google's RoboCat, Meta's Voicebox, Opera's AI browser and Musk-Zuckerberg catfight π₯
Hey there π
This week we saw quite a few interesting developments in robotics. From affordable home robots that can navigate homes effortlessly to robotics agents that can learn themselves to operate different arms, these advancements are really pushing the boundaries of what robots can do.
Speaking of arms, Stability AIβs new image generation models have caught the limelight this week. People are buzzing about the stunningly beautiful arms these models can create, showcasing a remarkable improvement from the days of blurry and blotchy hands.
Besides these, there were many other intriguing events! With Sam Altman surrounded by controversies again to TikTok's impressive purchase of GPUs worth a staggering $1 billion from NVIDIA, there's been no shortage of news to grab our attention. Want to know them all? Just keep scrolling!
This issue covers:
Latest Developments π
News from the Industry π§βπ«
Tools of the Trade βοΈ
Hot Takes π₯
AI Meme of the Week π€‘
Latest Developments π
Our Pick π
RoboCat: A self-improving robotic agent: Learns to operate different robotic arms, solves tasks from a small number of demonstrations, and enhances its performance through self-generated data.
Rerender a Video: Zero-shot text-guided video-to-video translation, utilizing an adapted diffusion mode, achieving temporal coherence without re-training or optimization.
Language to Rewards for Robotic Skill Synthesis: Defining reward parameters to optimize control policies and enable robots to accomplish diverse tasks using LLMs.
DreamHuman: Generates realistic and animatable 3D human avatars from textual descriptions, surpassing previous approaches in visual quality and diversity.
DecodingTrust: Analyzing trustworthiness in GPT models to expose biases, toxicity, privacy breaches, and manipulation potential.
Seeing the World through Your Eyes: Reconstructing 3D scenes from eye reflections in portraits, enhancing eye poses, scene radiance, and iris texture.
Textbooks Are All You Need: Phi-1, a compact language model trained on top-quality textual data, exhibits exceptional accuracy.
MagicBrush: A manually annotated dataset for instruction-guided image editing enabling the training of large-scale text-guided image editing models.
LLMs can label data as well as humans, but faster: SOTA LLMs can label text datasets at the same or better quality but ~20x faster and ~7x cheaper, GPT-4 being the top performer.
Explore, Establish, Exploit: A red teaming framework for evaluating and mitigating harmful outputs of LLMs.
Full Parameter Fine-tuning for LLMs with Limited Resources: A new optimizer called LOMO, which combines gradient computation and parameter update to reduce memory usage.
Demystifying GPT Self-Repair for Code Generation: Examining GPT's self-repair capability in code generation, notably improved with feedback.
VidEdit: Zero-shot text-based video editing that achieves strong temporal smoothness and preservation of the original video structure.
Scaling Open-Vocabulary Object Detection: Achieved using pretrained vision-language models, self-training with pseudo-box annotations showing significant improvements.
Macaw-LLM: A multi-modal language model integrating diverse data types to handle complex real-world scenarios.
Perceiver TF: A deep neural network framework that improves multitrack music transcription by accurately transcribing multiple instruments and vocals together.
KoLA: A benchmark to assess LLMs' knowledge-related abilities, using fair data comparisons and contrastive evaluation criteria, resulting in intriguing findings.
HomeRobot: An economical home robot capable of performing tasks using open-vocabulary mobile manipulation.
Language-Guided Music Recommendation for Video via Prompt Analogies: Using natural language prompts and a trimodal model to retrieve music samples that match the video style and user's language query.
Robots Learning from Visual Affordances: Enabling robots to learn tasks from videos for flexible performance in varied environments.
MPT-30B by MosaicML: A new, more powerful open-sourced LLM licensed for commercial use that offers 8k context window, outperforms GPT-3.
OpenLLaMA: Open-source reproductions of Meta's LLaMA models demonstrating comparable performance to original LLaMA and GPT-J models.
News from the Industry π§βπ«
Our Pick π
Opera has made its AI-powered browser Opera One available for download that has Aria (AI chat assistant) and an intuitive interface with Modular Design, Tab Islands, Multithread Compositor and more!
Stability AI has released SDXL 0.9, an advanced AI text-to-image model which offers realistic imagery along with image-to-image prompting, inpainting and outpainting, now available on the Clipdrop platform.
Meta AI has released text-to-speech AI model Voicebox that can generalize across tasks and synthesize high-quality audio in multiple languages, performing noise removal, content editing, style conversion, and diverse sample generation.
ByteDance, the company behind TikTok, has purchased over $1 billion worth of Nvidia GPUs, which alone is almost the total number of commercial GPUs sold by Nvidia in China last year.
While advocating for AI regulations, Sam Altman has lobbied the EU to weaken regulations in the AI Act to reduce the regulatory burden on the company, and proposed amendments that were later added to the final EU law.
President Xi Jinping in a meeting with Bill Gates said he welcomes U.S. firms like Microsoft bringing their AI technology to China, emphasizing the importance of AI in economic development.
Elon Musk at a meet with Indiaβs PM Modi asserted that βIndia has more promise than any large country in the worldβ, and plans to bring Tesla to India soon.
China reportedly has a thriving underground market for Nvidia's A100 and H100 chip, which the US has banned, where vendors easily procure these chips, satisfying high demand despite double the price.
Mercedes-Benz is integrating ChatGPT into its MBUX infotainment system in the US which will assist with tasks like navigation, media control, destination details and conduct conversations.
The AI-fueled stock-market boom in 2023 has surged the fortunes of the world's wealthiest people, with their combined wealth increasing by over $150 billion.
AWS has launched a $100 million program, the Generative AI Innovation Center, aimed at supporting customers and partners working on generative AI projects.
Dropbox launched a new $50 million venture fund focused on startups in the AI space called Dropbox Ventures that will provide mentorship along with financial support to build AI-powered products.
Google is committing over $20 million to support the expansion of cybersecurity clinics at 20 institutions across the U.S., providing hands-on experience and free security services to students.
Google Cloud has launched Anti Money Laundering AI that helps financial institutions detect money laundering more effectively by providing a consolidated machine learning-generated customer risk score.
The Guardian is adopting generative AI to enhance journalism while safeguarding against bias and maintaining human oversight, emphasizing responsible use and transparency.
YouTube is testing an AI-powered dubbing tool in collaboration with Googleβs Aloud, allowing creators to automatically dub their videos into different languages.
ChatGPT accounts of over 100,000 users have been hacked that contained personal information of the users, with India being hit the hardest.
Tools of the Trade βοΈ
Our Pick π
Dropbox AI: New AI-powered tools released by Dropbox which include universal search that connects all tools, content and apps in a single search bar, smart suggestions, summarizing, collaborative document editor, and more.
Giskard: Testing and debugging solution to detect risks of performance issues, biases, and errors in your model before production.
Warp: Modern, Rust-based terminal with built-in AI that speeds up software development.
Nonoisy: Online audio editing platform that uses AI to remove background noise, master audio, and level volume.
Leet Resumes: Free resume writing service, uses a combination of AI and human expertise to create resumes tailored to individual needs.
NoteGenie: AI-driven note-taking platform with features like identifying key topics, extracting important information, and categorizing notes.
Danelfin: Stock analytics platform that uses AI to analyse over 10,000 features per day per stock and assigns a rating out of 10.
Parallel Domain Data Lab: A synthetic data platform that generates high-fidelity, customisable synthetic data for training computer vision and perception models.
Parrot: AI-powered platform for remote depositions, offering digital reporting, transcription, video conferencing, and live chat solutions.
Factiverse AI - Editor: Find factual mistakes in your text (whether human written or AI generated) and get links to credible sources to verify the information.
EssayGrader: Uses AI to provide feedback on the quality of long essays, summarize, and detect AI-generated text.
MyShell: Create personalized chatbots called Shells, customize your Shell's appearance, voice, and personality, and train it to perform specific tasks.
GPT Engineer: Open-source tool to automatically generate entire well-formatted and functional codebases from simple text prompts.
AI Signature Generator: Create a personalized, professional handwritten signature with AI enhancement.
Upword: AI-powered research tool that generates notes, extracts key ideas, summarizes, simplifies and translates, text and generates audio summaries.
Virtual CMO: AI marketing co-pilot for solopreneurs, just state the business and problems and get effective marketing solutions.
Hot Takes π₯
Meme of the Week π€‘
Thatβs all for this week!
Will see you next Saturday with more such content. Donβt forget to subscribe and give your feedback below.
BONUS π
Share this newsletter with three other friends and stand a chance to win my book GPT-3: The Ultimate Guide to build NLP Products with OpenAI API. Winners will be selected on a monthly basis.
Thanks for the wonderful list. Also please include essay-grader.ai and questionai.io in the list.
Shubham, this is an excellent issue.
Tell me, for use and spreading by βBuild in Publicβ community sites like imminent teleindia. com, singapore. net, indonesia. net and more, are there meta compilations for sub-areas eg LLM semantics, video rendition, LLM to robot interfaces and such?
Or can we, #SpiceTradeAsia_Prompts, commission Unwind. AI to assist in doing so?