The Memo - 1/Nov/2022

New Stable Diffusion stats, ERNIE-ViLG 2.0, new Chinchilla viz, and much more!

Oct 31, 2022

FOR IMMEDIATE RELEASE: 1/Nov/2022

Welcome back to The Memo. This is a really long edition. Thank you to Christopher (Midjourney) and Mateusz (DALL-E 2 + SD) for the complimentary eBooks for paid subscribers. We are also featuring a new 100-page prompt book for Stable Diffusion. You can find all this and more under the ‘Toys to Play With’ section towards the end of this edition.

The BIG Stuff

Stable Diffusion (SD) stats (17/Oct/2022)

I’m putting this under ‘big stuff’ as the open nature of Stability.ai (the company) and Stable Diffusion (the text-to-image model) is a game-changer. The model been released into the wild, and is being applied to many different use cases (from NSFW content to Pokemon to infinite canvases and beyond!). This month, Stability.ai announced that:

SD has 1 million DreamStudio users (in 50 countries).
170 million images generated via DreamStudio to Oct/2022.
SD downloaded and licensed by more than 200,000 developers globally.
Stability.ai has added USD $101 million in funding.

Read the full press release (17/Oct/2022).

Instruct models are outperforming average humans, approaching the level of expert humans already (28/Oct/2022)

I’m not a fan of using Twitter as a source for this stuff (and even less trying to decipher charts ‘designed’ by engineers!), but Twitter seems to be a central-source-of-truth for a lot of AI work. AI Prof Samuel Albanie has highlighted how Flan-PaLM (mentioned in the previous edition of The Memo) is fairing on benchmarks that measure 57 tasks on mathematics, history, computer science, and more. Flan-PaLM is outperforming ‘average’ humans, and getting much closer to human expert-level performance on a range of tasks.

Read the unrolled thread: https://threadreaderapp.com/thread/1586052091465723905.html

ServiceNow & Hugging Face release 3.1TB code dataset (27/Oct/2022)

Perhaps in answer to the code copyright issues raised later in this edition (see Copilot lawsuit below), ServiceNow & Hugging Face have released a 3.1TB dataset of permissively licensed code in 30 programming languages. This is about 4x larger than the dataset used to train GPT-3 (though obviously ‘code only’), and 3x the size of CodeParrot, the next-largest released code dataset.

[Sidenote: an exclusive]: In The Stack paper, the researchers did not reference DeepMind's MassiveText dataset, which also collected exactly 3.1TB of permissively licensed code data.

For Github, we restrict the data to only include code with the following permissive licenses: Apache License version 2.0, MIT license, The 3-clause BSD license, The 2-clause BSD license, Unlicense, CC0, ISC license, and Artistic License 2.0.
— Gopher paper, Dec/2021

So, this is either a huge oversight by ServiceNow/Hugging Face, or a deliberate choice to pretend they didn’t know about DeepMind's (unreleased) dataset that essentially does the same thing.

If this is a recreation of DeepMind's dataset (similar to how EleutherAI's The Pile dataset was compared to GPT-3’s dataset), it would have been polite to at least mention the original.

It may be an oversight, as The Stack paper does reference/compare to CodeParrot (open), AlphaCode (closed), CodeGen (closed dataset but open model), and PolyCoder (open).

Read more about The Stack: https://www.bigcode-project.org/docs/about/the-stack/

Read the paper: https://drive.google.com/file/d/17J-0KXTDzY9Esp-JqXYHIcy--i_7G5Bb/view

Download The Stack: https://hf.co/BigCode

The Interesting Stuff

New viz on data-optimal models (Oct/2022)

I wanted to clearly show the training data used across a range of large language models. Here, GPT-3 and Gopher etc are in the ‘red zone’ as they used just 1 or 2 tokens per parameter. The new standard, Chinchilla—discovered by DeepMind in Mar/2022 (paper)—is in the centre of the ‘green zone’, and recommends 20 tokens per parameter. Some newer models are going beyond this (the ‘blue zone’), though may now be using too much data!

I plan to plot GPT-4 on this chart as soon as that model is released…

Download the PDF: https://lifearchitect.ai/models/#chinchilla

Shutterstock partners with OpenAI (25/Oct/2022)

The stock photo site Shutterstock has partnered with OpenAI.

From 2021 onwards, Shutterstock sold images and metadata to OpenAI to help create DALL-E (OpenAI’s Altman describes this data as “critical to the training of DALL-E”). Now, with the integration of OpenAI’s text-to-image AI, the partnership is going full circle, and DALL-E’s output will compete with the same individuals whose work was used to train it.

Read the full story: https://www.theverge.com/2022/10/25/23422359/shutterstock-ai-generated-art-openai-dall-e-partnership-contributors-fund-reimbursement

Articulating the last 18 months of generative AI (28/Oct/2022)

TechCrunch did well with this interview:

Every 14 years, we get one of these Cambrian explosions. We had one around the internet in ’94. We had one around mobile phones in 2008. Now we’re having another one in 2022…
With most technologies, there is sort of an uncomfortableness that people have of [for example] robots replacing a job at an auto factory. When the internet came along, a lot of the people who were doing direct mail felt threatened that companies would be able to sell direct and not use their paper-based advertising services. But [after] they embraced digital marketing, or digital communication through email, they probably had tremendous bumps in their careers, their productivity went up, the speed and efficiency went up. The same thing happened with credit cards online. We didn’t feel comfortable putting credit cards online until maybe 2002. But those who embraced [this wave in] 2000 to 2003 did better.

https://techcrunch.com/2022/10/28/generative-ai/

Baidu releases new text-to-image model: ERNIE-ViLG 2.0 (27/Oct/2022)

Most of these AI models have awkward names! (Well, besides DeepMind’s animal kingdom.) ERNIE-ViLG 2.0 has 24B parameters, and only 170M image-text pairs, based on English datasets like LAION (translated to Chinese) and a series of internal Chinese datasets.

Despite seeing less than 10% of the training images sent to the latest models, ERNIE-ViLG 2.0 is significantly preferred over—and outperforming outputs from—DALL-E 2, Imagen, Parti, and Stable Diffusion in most tests:

Read the paper (only a few examples): https://arxiv.org/abs/2210.15257v1

Play with it yourself (this is definitely ERNIE-ViLG 2.0, updated in the last few days): https://huggingface.co/spaces/PaddlePaddle/ERNIE-ViLG

Watch my video:

Financial Times: The golden age of AI-generated art is here. It’s going to get weird (27/Oct/2022)

This is a well-written article from The Fin on the advancement of AI in art.

…the technology is advancing swiftly. Six months ago most tools struggled to create human faces, usually offering grotesque combinations of eyes, teeth and stray limbs; today you can ask for a “photorealistic version of Jafar from Disney’s Aladdin sunbathing on Hampstead Heath” and get almost exactly what you’re looking for.
https://www.ft.com/content/073ea888-20d7-437c-8226-a2dd9f276de4

NYT: A.I.-Generated Art Is Already Transforming Creative Work (21/Oct/2022)

Sarah Drummond, a service designer in London, started using A.I.-generated images a few months ago to replace the black-and-white sketches she did for her job. These were usually basic drawings that visually represented processes she was trying to design improvements for, like a group of customers lining up at a store’s cash register.
Instead of spending hours creating what she called “blob drawings” by hand, Ms. Drummond, 36, now types what she wants into DALL-E 2 or Midjourney.
“All of a sudden, I can take like 15 seconds and go, ‘Woman at till, standing at kiosk, black-and-white illustration,’ and get something back that’s really professional looking,” she said.

Read the article: https://archive.ph/soZl5

Massive 4K-resolution DALL-E 2 outputs with inpainting (Apr/2022)

I’m a little late with this one, but wanted to capture it here! David Shnurr is an engineer with OpenAI, and played with ‘arbitrarily large’ murals via DALL-E 2. The images below are an incredible 4096×2341 pixels.

Check them out (Twitter) or click on the image above to see the full size in your browser!

David Schnurr @_dschnurr

Inpainting with DALL·E 2 is super fun. With some ingenuity, you can create arbitrarily large artwork like the murals shown below – which I assume are the largest #dalle-produced images created so far.

Stock.ai as a stock photo platform for AI images (Oct/2022)

Take a look: https://www.stockai.com/photos/collection

CommonSim-1 by Common Sense Machines (17/Oct/2022)

CommonSim-1 is a multimodal/world model; a neural network pretending to be a simulator which people can use to generate arbitrary 3D scenes and simulations.

Read the blog post: https://csm.ai/commonsim-1-generating-3d-worlds/

Microsoft releases PACT, a model for robot data (28/Oct/2022)

Perception-Action Causal Transformer (PACT) is a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion.

Read the blog post.

Watch the (very fast) video with robotic examples.

Meta announces AudioGen (5/Oct/2022)

Researchers from Meta AI have announced AudioGen: a Transformer-based generative AI model that can generate audio from scratch, to match text input or extend existing audio input.

Paper: https://openreview.net/pdf?id=CYK7RfcOzQ4

Demo: https://anonymous.4open.science/w/iclr2023_samples-CB68/report.html

Copilot law suit (Oct/2022)

Some typographer/programmer/lawyer in California doesn’t like how GitHub Copilot (powered by OpenAI Codex/GPT-3) uses code without citing the original licence. So, he’s creating a big fuss and drawing up a potential lawsuit. Unfortunately, his perspective is skewed, and even seems misinformed:

…maybe you’re a fan of Copilot who thinks that AI is the future and I’m just yelling at clouds. First, the objection here is not to AI-assisted coding tools generally, but to Microsoft’s specific choices with Copilot. We can easily imagine a version of Copilot that’s friendlier to open-source developers—for instance, where participation is voluntary, or where coders are paid to contribute to the training corpus. Despite its professed love for open source, Microsoft chose none of these options.

At least his website looks beautiful!

Take a look: https://githubcopilotinvestigation.com/

Watch a recent conversation with Leta about this:

Toys to Play With

Stable Diffusion Prompt Book by OpenArt (28/Oct/2022)

If you want to learn more about generating AI art and how to master prompt crafting, then add this free book to your collection, it is enormously valuable. The examples are brilliant, and there are dozens of tips and tricks you can try right now, all for free (I use the official DreamStudio app or the simpler Mage.space interface for generating images with Stable Diffusion).

Download SD prompt book: PDF (104 pages).

Compare with the DALL-E 2 prompt book featured in The Memo back in Jul/2022…

Download DALL-E 2 prompt book: PDF (82 pages).

Julian Lennon - Lucky Ones music video using Stable Diffusion (22/Oct/2022)

Description: Julian and I want to make a visual piece that shows people from around the world gravitating toward each other in a celebration of change to make the world a better place. Besides everyone's love for music (that naturally draws humans together), they also take small steps to improve our environment back to health.
… [we] used Stable Diffusion & Disco Diffusion on Google Notebook Colab to write Ai code to our footage. Our texted prompts were like writing a script of what we wanted to see come alive in our narrative. It was extremely powerful when dreaming up how scenes could evolve from fiery ashes to blooming flower fields.

Watch the final video using Stable Diffusion:

Interview with the creator: AI art is really rising up and I've just been infatuated with it. I’ve messed around with it a little for still image... it's moving so fast that when I started Lucky Ones production, the technology moved so quickly and I'm like: 'Wow, I think it is possible that I can [take] these frames of this finished work that we're doing, and make it talk to the footage and say something narrative!'
It's definitely going to be beautiful, visceral, abstract, but I also wanted to keep it within our message. It's at that point where we can tell it narrative things and frame by frame (it takes forever)... I have sleepless nights waiting to see the footage.
It's still blowing my mind. I'm using it responsibly I think, because I'm not just trying to throw anything at it and just make something crazy. I'm giving it prompts where I think it can really tell the story of what we want to say.

Watch the behind-the-scenes explanation of AI art by Julian and David (25:33).

AbsXcess: Manga comic book written with Midjourney (25/Oct/2022)

This black-and-white comic book was illustrated by the text-to-image model, Midjourney. It looks spectacular (if you’re into manga!). Thanks to Christopher English and team for releasing this at no charge. Christopher says:

[As a human, I wrote the] text first. I write a page at a time to make sure that I’m able to generate what I want in Midjourney. If it doesn’t look right… then I redesign the scene to fit Midjourney’s limitations. But I always know how the story will begin and end.

Download eBook (ePub - 10MB).

Or, download the print-ready PDF (Warning: 343MB!).

Visit the site: https://english-productions.com/books/

The Curator: Book written with GPT-3 + DALL-E 2 + Stable Diffusion (Oct/2022)

This is an interesting concept, first generating the text via GPT-3, and then combining both DALL-E 2 and Stable Diffusion for some very beautiful artwork. Note that I don’t agree with/condone the AI’s generated text, as it may or may not align with the ‘truth’ (see my video comparing model training with Jell-O for a simple explanation).

Download eBook (spreads for desktop - 16MB).

Download eBook (single pages for mobile - 16MB).

Direct link to the eBook on Gumroad, where you can donate a few € to the (co-)author.

Just eight or so weeks left of 2022… AI labs loved releasing stuff up to the last minute in 2021 (they don’t seem to care about calendars!). I wonder if it will be the same this year…

All my very best,

Alan
LifeArchitect.ai

Housekeeping…

Unsubscribe:

Older subscriptions before 17/Jul/2022, please use the older interface or just reply to this email and we’ll stop your payments and take you off the list!
Newer subscriptions from 17/Jul/2022, please use Substack as usual.

Note that the subscription fee for new subscribers will increase from 1/Jan/2023. If you’re a current subscriber, you’ll always be on your old/original rate while you’re subbed.

Gift a subscription to a friend or colleague for the holiday season:

Buy gift subscription here.

The Memo by LifeArchitect.ai

The Memo - 1/Nov/2022

New Stable Diffusion stats, ERNIE-ViLG 2.0, new Chinchilla viz, and much more!

The BIG Stuff

The Interesting Stuff

Toys to Play With

Next