In today’s email:

  • 🧮 How AI Discovered a Faster Matrix Multiplication Algorithm

  • 📣 How to create an AI narrator for your life

  • 🧠 How AI could lead to a better understanding of the brain

  • 🧰 9 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

Top News

Gemini, developed collaboratively by Google teams, is a multimodal model, able to process and combine text, code, audio, images, and video. It is designed to be flexible and efficient, running on various platforms from data centers to mobile devices. Gemini comes in three versions: Ultra, Pro, and Nano, each optimized for different scales and complexities of tasks.

Gemini Ultra has surpassed human experts in MMLU (massive multitask language understanding) and outperformed previous models in various benchmarks, including image understanding and complex reasoning. It’s also highly capable in coding, with its advanced version, AlphaCode 2, outperforming in programming competitions.

Gemini’s development utilized Google’s Tensor Processing Units (TPUs), ensuring efficiency and scalability. With its multimodal capabilities, Gemini can analyze complex information across various formats, aiding in fields like science and finance.

Gemini is being integrated into various Google products like Bard, Pixel, Search, and others. It will be available to developers and enterprise customers through the Gemini API, with Gemini Ultra set to be released after extensive safety checks and refinements. 

Despite the excitement, the Gemini benchmarks appear to be heavily gamed. They seem to be carefully presented and use different prompting techniques to show Gemini beating GPT-4. Since no one can actually use Gemini Ultra to confirm, I think it’s safe to assume this is fluff for shareholders and that once the public gets their hands on it will be apparent that GPT-4 is still the better model and better aligned with fewer hallucinations.

  • ⚡️ Supercharge your writing and creativity

  •  💯 Choose from 100+ use cases tailored to specific roles

  • 🔥 Generate action items from Docs and conversations …

The process involves three main components:

  1. Vision Model: This model uses a computer camera to “see” and describe images. The author suggests two options:

    • Llava 13B: An open-source, cost-effective model that gives basic descriptions of images.

    • GPT-4-Vision: A more advanced and slightly more expensive model that provides detailed, nuanced descriptions.

  2. Language Model: This model writes the script for narration. Mistral 7B is used to generate a nature documentary-style script based on the image description provided by the vision model. Alternatively, GPT-4-Vision can be used to both describe the image and generate the script in one step.

  3. Text-to-Speech Model: This converts the written script into spoken audio. The author recommends ElevenLabs’s voice cloning feature for high-quality output or XTTS-v2 as an open-source alternative. These tools can mimic specific voices, enhancing the narration’s realism.

The author demonstrates this process using a video where an AI clone of Sir David Attenborough describes the author drinking from a cup, humorously suggesting it as part of a “mating display.” This video went viral, showcasing the potential of these AI tools.

It was a strange Thanksgiving for Sam Altman. Normally, the CEO of OpenAI flies home to St. Louis to visit family. But this time the holiday came after an existential struggle for control of a company that some believe holds the fate of humanity in its hands. Altman was weary. He went to his Napa Valley ranch for a hike, then returned to San Francisco to spend a few hours with one of the board members who had just fired and reinstated him in the span of five frantic days. He put his computer away for a few hours to cook vegetarian pasta, play loud music, and drink wine with his fiancé Oliver Mulherin. “This was a 10-out-of-10 crazy thing to live through,” Altman tells TIME on Nov. 30. “So I’m still just reeling from that.” Continue reading …

Other stuff

GPT Store now in Superpower ChatGPT. 🎉

1000s of custom GPTs right inside ChatGPT and adding more every day

Superpower ChatGPT Extension on Chrome
Superpower ChatGPT Extension on Firefox
Tools & LinkS

Santa Cat brings AI-powered holiday magic to families, and showcases building voice-driven virtual characters

CopilotKit – Build in-app AI chatbots 🤖, and AI-powered Textareas , into React web apps.

Lume – Automate data mappings using AI – Breakthrough uses AI to find The Reason Why Your User Give Up

Mastering ChatGPT: Unlock the Power of Prompt Templates & Supercharge Your Workflow!

Unclassified 🌀 

⚡️ Be the Highlight of Someone’s Day – Think a friend would enjoy this? Go ahead and forward it. They’ll thank you for it!

Hope you enjoyed today’s newsletter

⚡️ Join over 200,000 people using the Superpower ChatGPT extension on Chrome and Firefox.

Superpower ChatGPT Extension on Chrome
Superpower ChatGPT Extension on Firefox

Source link

What's Your Reaction?

hate hate
confused confused
fail fail
fun fun
geeky geeky
love love
lol lol
omg omg
win win
The Obsessed Guy
Hi, I'm The Obsessed Guy and I am passionate about artificial intelligence. I have spent years studying and working in the field, and I am fascinated by the potential of machine learning, deep learning, and natural language processing. I love exploring how these technologies are being used to solve real-world problems and am always eager to learn more. In my spare time, you can find me tinkering with neural networks and reading about the latest AI research.


Your email address will not be published. Required fields are marked *