The Memo - 1/Oct/2022

Text-to-Video models, DeepMind Sparrow, OpenAI Whisper for GPT-4 training data, and much more!

Sep 30, 2022

FOR IMMEDIATE RELEASE: 1/Oct/2022

Welcome back to The Memo.

The BIG Stuff

Interview with Sam Altman on the next era of AI (13/Sep/2022)

I’m filing this under ‘big stuff’, because Sam Altman is the CEO of OpenAI, which has given us many of the largest and most popular models available (GPT-3, DALL-E 2, CLIP, and more).

The interview—by Reid Hoffman, founder of LinkedIn—covers everything from Transformers to therapist chatbots to having children during the AI revolution. There is an audio version and a transcript version.

Read/listen: https://greylock.com/greymatter/sam-altman-ai-for-the-next-era/

The Interesting Stuff

DeepMind Sparrow chatbot/dialogue model based on Chinchilla 70B (20/Sep/2022)

I’ve documented the 23 rules used by DeepMind in their newest (closed) chatbot model, Sparrow. Even at 70B parameters, this is currently the largest chatbot in the world, as Chinchilla uses 1.4 Trillion training tokens, equivalent to about 2.3TB of training data (see my table below), which is more data than all 6.4M books on the Amazon Kindle US store.

I’m disappointed and concerned about the time wasted by AI labs trying to censor and constrain these models down to fit into the 2020s bias/identity/gender obsession, but I’ll leave that discussion for another time.

Read the chatbot rules: https://lifearchitect.ai/sparrow/

Download my annotation of the prompt: https://lifearchitect.ai/sparrow/

Read the paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Authors-Notes/sparrow/sparrow-final.pdf

OpenAI releases Whisper model with weights (22/Sep/2022)

OpenAI trained an audio-to-text Transformer model with ‘improved robustness to accents, background noise and technical language’.

From my early and high-level assessment, Whisper’s results seem to be far superior to Otter.ai, the AI transcription platform that has been my bedrock for the last two years of transcriptions here at Life Architect Consulting! I used Otter to record, transcribe over 100 interviews for my most recent book, The Ultimate Coach (Nov/2021), available in paperback, Kindle, and audiobook on Amazon.

Check out the Whisper examples, especially the ‘Accent’ example (bottom one in the dropdown menu), where a Scotsman with a very thick accent talks about Merlin splitting the landscape near the Eildons!

Read the blog/paper/examples: https://openai.com/blog/whisper/

Try it out for free in the playground (click the green microphone icon in the top right of the textarea): https://beta.openai.com/playground?model=audio-transcribe-001

p.s. It is highly likely that Whisper was used by OpenAI to transcribe hundreds of millions of audio and video clips to text, creating a larger dataset for training the upcoming GPT-4. Remember, YouTube has around 1 billion videos, with 500 hours of new video uploaded every minute of every day.

OpenAI has to hit a LOT of data for their next model, to align with DeepMind’s Chinchilla scaling laws that essentially found that they should have used 12x more data for GPT-3’s 175B parameters (3.5T training tokens instead of 300B training tokens). Here’s my table from my mid-year paper, Integrated AI: The sky is bigger than we imagine (mid-2022 AI retrospective):

Are they aiming for 8TB of text data (from audio/video transcriptions), for a 250B-parameter GPT-4 model?

We’ll see very soon!

Text-to-Video: Meta AI Make-A-Video (30/Sep/2022)

Prompt: A confused grizzly bear in calculus class

Here it is, text-to-video generation already! Some thought it was ‘years’ away, but I knew that even months was a stretch. This is just weeks since the last major text-to-image release, Stable Diffusion. It is called ‘Make-A-Video’.

Check out the amazing little video demos: https://makeavideo.studio/

Text-to-Video: Phenaki (30/Sep/2022)

Also released today, this looks to be a Chinese model, though the paper is listed anonymously during the double-blind review.

Check out the video results: https://phenaki.video/

Kempner and Harvard (23/Sep/2022)

Zuckerberg founded an AI research lab at Harvard that “seeks to better understand the intersection between natural and artificial intelligence“.

Read the release: https://www.thecrimson.com/article/2022/9/23/zuckerberg-chan-kempner-launch/

Anthropic and Harvard look at models and neurons (15/Sep/2022)

This pre-paper is a very long read, and very technical, but shows interesting links between models and the brain. Note the interaction and collaboration between Anthropic (paper author), DeepMind (collaborative reviewers), and OpenAI (collaborative reviewers), in the comments/review.

https://transformer-circuits.pub/2022/toy_model/index.html

Hydra Attention by Meta AI (15/Sep/2022)

This is a very technical paper, but essentially outlines a way of building on the Transformer with multi-head attention, purported to be 197x faster than standard attention.

…introducing Hydra Attention, an extremely efficient attention operation for Vision Transformers (ViTs). Paradoxically, this efficiency comes from taking multi-head attention to its extreme: by using as many attention heads as there are features, Hydra Attention is computationally linear in both tokens and features with no hidden constants, making it significantly faster than standard self-attention

Read the paper: https://arxiv.org/abs/2209.07484

AI-generated art being sold on stock photo sites (17/Sep/2022)

https://arstechnica.com/information-technology/2022/09/artists-begin-selling-ai-generated-artwork-on-stock-photography-websites/

Shopify’s work (13/Sep/2022)

Russ Maschmeyer @StrangeNative

1/ As soon as StableDiffusion landed we dropped everything to build a GENIE! 🧞‍♂️ 🎙 Voice UI 🖼 AI Art 😎 AR previews 🤑 Instant purchasing 📦 On-demand production Could AI make every shopper’s wish come true? 🤔👇 #ai #aiart #stablediffusion #dalle #dalle2 #vui

BLOOM 176B & Stable Diffusion: video for World Peace Day (21/Sep/2022)

With the Wise/White Mirror group, I filmed a short 5-minute walkthrough of generating an AI image and an AI message for World Peace Day 2022. This video is an exclusive for paid readers of The Memo:

ProgPrompt: More Robotics + LLMs (26/Sep/2022)

Researchers at NVIDIA and the University of Southern California hooked GPT-3 up to robots, both physical and simulated. By asking GPT-3 to write code for actions, they are able to generate outputs that are ready to feed to the robots. The results are impressive, and give a vivid idea of what the next generation of embodied AI will be like…

The videos towards the bottom of the blog post are worth watching.

Read the ProgPrompt blog: https://progprompt.github.io/

Read the ProgPrompt paper: https://arxiv.org/abs/2209.11302

AI voice for Darth Vader (26/Sep/2022)

Like Synthesia AI and Sonantic AI (used for Leta AI and Una AI), Ukrainian startup Respeecher have leveraged AI technology to recreate the sound of the voice of James Earl Jones.

Read the article: https://www.theverge.com/2022/9/24/23370097/darth-vader-james-earl-jones-obi-wan-kenobi-star-wars-ai-disney-lucasfilm

Toys to Play With

Character.ai chatbots (Sep/2022)

Designed by one of the authors of Google’s Transformers paper, the character.ai chatbots are seeing renewed attention this week. Talk with an AI chatbot primed as a life coach, a squirrel, a spaceship, or any of the other options. Free, web-based, but there is a login required.

Play with it yourself: https://beta.character.ai/

Super-simple interface for Stable Diffusion (17/Sep/2022)

No login. No mess. Generate images with one of the largest text-to-image models in the world, Stable Diffusion.

Try it yourself: https://www.mage.space/

GPT-3 friends in a game (21/Sep/2022)

In this Unity prototype, every word spoken by the two companion characters Jenny and Brayton is made up via GPT-3 based on what happened, what's around them, and what speech mood they're in. Backstories of items are improvised by GPT-3 on the spot, but then memorized for the remainder of the session. Occasionally GPT-3 is asked to summarize the most important knowledge so that it can be referenced in the future.

The Memo by LifeArchitect.ai

The Memo - 1/Oct/2022

Text-to-Video models, DeepMind Sparrow, OpenAI Whisper for GPT-4 training data, and much more!

The BIG Stuff

The Interesting Stuff

Toys to Play With

Next