The Memo - 15/Sep/2022

Google Pathways PaLI, Andi as replacement to Google search, Adept ACT-1, and much more!

Sep 15, 2022

FOR IMMEDIATE RELEASE: 15/Sep/2022

Welcome back to The Memo.

The BIG Stuff

Google Pathways PaLI: Pathways Language and Image model (15/Sep/2022)

Google continues to build on the massive Pathways architecture, announcing a new visual language model today. It is similar to DeepMind Flamingo (which includes Chinchilla), but sorts through 10B+ images (5x bigger than LAION and competitors), plus the mT5 large language model, for a total of 17B parameters. As a reminder, visual models only output text. You can see this in action with my videos on DeepMind Flamingo: the Flamingo video - Part 1, and the model joking about a photo of former President Obama in the Flamingo video - Part 2.

Read the paper: https://arxiv.org/abs/2209.06794

I stand by my assertion that Google Pathways is a sleeping hit, and their entire philosophy to design a model architecture that does everything is groundbreaking. For a look at the state-of-play back in Aug/2022(!), download my independent report. Note that there have been two new models added to the family in the ~4 weeks since then: PaLM-SayCan for embodiment/robotics, and this PaLI vision model:

Read my original Pathways report (Aug/2022): https://lifearchitect.ai/pathways/

Watch the video (Aug/2022):

New viz: Code Generation models (Sep/2022)

I looked at the major code generator models, including those bigger than GitHub CoPilot (OpenAI Codex). Notably, Salesforce’s CodeGen model is open, and far bigger than CoPilot. However, it should be noted that even smaller models like Google’s monorepo is seeing extraordinary results.

https://lifearchitect.ai/models/#code

Adept’s ACT-1 Transformer model (14/Sep/2022)

Headed up by the creator of Google Transformer, Dr Ashish Vaswani, and other researchers coaxed away from DeepMind, Google Brain, and OpenAI, plus more than $65m funding, Adept have finally detailed something amazingly powerful. The ACT-1 model looks like it brings an exponential improvement to human efficiency in browser and application use.

ACT-1 is a large-scale Transformer trained to use digital tools — among other things, we recently taught it how to use a web browser. Right now, it’s hooked up to a Chrome extension which allows ACT-1 to observe what’s happening in the browser and take certain actions, like clicking, typing, and scrolling, etc.

You have to see it to believe it.

Take a look at the demos: https://www.adept.ai/act

The Interesting Stuff

GitHub looks at developer stats for Copilot (8/Sep/2022)

Following on from my viz, GitHub found that in general, developers who used GitHub Copilot were about twice as fast, and rated themselves as 88% more productive!

Survey responses measuring dimensions of developer productivity--perceived productivity, satisfaction and well-being, and efficiency and flow--when using GitHub Copilot

Read it: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/

Google AudioLM (7/Sep/2022)

Google’s AudioLM model is a way to generate high-quality audio via language modeling.

Starting from raw audio waveforms, we first construct coarse semantic tokens from a model pre-trained with a self-supervised masked language modeling objective. Autoregressive modeling of these tokens captures both local dependencies (e.g., phonetics in speech, local melody in piano music) and global long-term structure (e.g., language syntax and semantic content in speech; harmony and rhythm in piano music)… However, these tokens lead to poor reconstruction. To overcome this limitation, in addition to semantic tokens, we rely on fine-level acoustic tokens produced by a SoundStream neural codec, which capture the details of the audio waveform and allow for high-quality synthesis. Training a language model to generate both semantic and acoustic tokens leads simultaneously to high audio quality and long-term consistency.

Read the paper: https://arxiv.org/abs/2209.03143

Listen to the samples: https://google-research.github.io/seanet/audiolm/examples/

EU AI laws (Sep/2022)

The EU’s attempt to regulate open-source AI is counterproductive

A few years ago, my close friend and colleague, the late Dr Marianna Rusche, worked on the EU AI laws. Since her passing, the EU has continued to disregard contributing to the AI landscape, and instead focusing on holding back humanity through convoluted legal frameworks (similar to the GDPR farce). These new AI laws are a true abomination. The EU is wasting time trying to regulate a revolution that is in its fledgling stages, hamstringing humanity in the process. Their first order of business is blocking open-source models, and limiting visibility of AI models outside of big corporations. I have nothing good to say, and nothing more to add. Brookings provides some critical review:

[T]he public availability of GPAI models helps identify problems and advance solutions in the societal interest… Without open-source GPAI, the public will know less, and large technology companies will have more influence over the design and execution of these models. — Brookings (24/Aug/2022)

The Brookings report:https://www.brookings.edu/blog/techtank/2022/08/24/the-eus-attempt-to-regulate-open-source-ai-is-counterproductive/

Slashdot commentary: https://news.slashdot.org/story/22/09/06/2333234/the-eus-ai-act-could-have-a-chilling-effect-on-open-source-efforts

AI + Disney (10/Sep/2022)

[Disney’s Chief Scientist, Markus] Gross said that Disney wanted to use AI and deep learning to create animated characters that are “truly art directable in real time.” He spoke of the possibility of directors being able to give verbal instructions to animated characters about how to walk or which way to move.

https://www.ibc.org/news/ai-transforming-movie-production-at-disney-with-more-to-come/9075.article

Toys to Play With

Andi search (Sep/2022)

Andi is powered by large language models, and backed by Y Combinator. I had very good results using it as a Google search replacement this week!

[Andi] attempts to find and extract answers to questions, combining large language models akin to OpenAI’s GPT-3 with live web data.
Behind the scenes, Andi extracts information from web results ranked for relevance to the question being asked as well as overall quality (although it’s not clear how Andi defines “quality”). Depending on the subject matter, the platform uses different AI systems tailored for specific verticals (e.g. factual knowledge, programming or consumer health) and language models that generate answers by combining knowledge across multiple sources (e.g. Wolfram Alpha, Forbes, The New York Times, etc.). — TechCrunch (13/Sep/2022)

Try it: https://andisearch.com/

Download and run Stable Diffusion on your M1 Mac (12/Sep/2022)

This really is fast; a one-click install for Stable Diffusion on your silicon Mac. Images are generated in around one minute.

https://github.com/divamgupta/diffusionbee-stable-diffusion-ui

Read a tech-noir story by my friend in Germany + GPT-3 (2021)

I LOVED reading this story by a Leta AI viewer in Germany, using the massive GPT-3 language model. His pen name is Derek Beauregard. It's the length of a proper book, though unedited. This is an exclusive, he is not releasing this publicly, and there is nothing like this in the wild (or on Amazon) yet. Thank you so much to D for releasing this book to The Memo readers at no fee! Fall in love with Lucy and friends in Terminal City Chronicles (PDF).

We’re definitely into the AI summer, though the last ~2-3 weeks have been quieter than usual. I’ll be back soon with the next big thing(s)!

All my very best,

Alan
LifeArchitect.ai

The Memo by LifeArchitect.ai

Ready for more?