The Memo - 30/Apr/2023
Stability/DeepFloyd IF, RedPajama dataset, 'Laptop model' comparisons (Alpaca, Chimera), and much more!
FOR IMMEDIATE RELEASE: 30/Apr/2023
Welcome back to The Memo.
Thanks for being a part of history!
In the Policy section we look at what the US Government is doing with LLMs for Congress, take an inside look at how the US is implementing several post-2022 LLMs for military use, the EU’s latest updates to their Draft AI Act, and dive into an English copy of Japan’s latest AI white paper. (By the way, it is often next to impossible to find some of these resources. I should know; it’s my job to source it for you and put it here in The Memo!).
In the Toys to play with section, we look at some huge new dialogue models and interfaces, a free new iPhone app for AI video, a new hands-on tutorial by OpenAI, and much more…
Special: As promised, I’m also providing my recent 2-hour private keynote/workshop that was part of a well-produced multi-camera live and streamed paid event in Sydney (I believe tickets were $4,999). Links provided at the end of this edition for paid readers.
Part I: Large language models (hands-on with ChatGPT for business).
Part II: AI art (hands-on with Midjourney v5 including live audience examples).
The BIG Stuff
Exclusive: Users spent 13.9 billion minutes interacting with ChatGPT in March (Apr/2023)
1.6 billion visits x 8m44s (8.73) = 13.968 billion minutes
AI via ChatGPT is now ‘better’ and 980% ‘more empathetic’ than a doctor (28/Apr/2023)
[This study used ChatGPT.] Chatbot responses were rated of significantly higher quality than physician responses…9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot… If more patients’ questions are answered quickly, with empathy, and to a high standard, it might reduce unnecessary clinical visits, freeing up resources for those who need them… High-quality responses might also improve patient outcomes… responsive messaging may collaterally affect health behaviors, including medication adherence, compliance (eg, diet), and fewer missed appointments.
Google Brain merges with DeepMind (20/Apr/2023)
As referenced several times in the last few editions of The Memo, DeepMind and Google have formally announced the combining of the two organizations.
The research advances from the phenomenal Brain and DeepMind teams laid much of the foundations of the current AI industry, from Deep Reinforcement Learning to Transformers, and the work we are going to be doing now as part of this new combined unit will create the next wave of world-changing breakthroughs.
Bloomberg: LLMs increase productivity by 14% (24/Apr/2023)
Customer service workers at a Fortune 500 software firm who were given access to generative artificial intelligence tools became 14% more productive on average than those who were not, with the least-skilled workers reaping the most benefit.
That’s according to a new study by researchers at Stanford University and the Massachusetts Institute of Technology who tested the impact of generative AI tools on productivity at the company over the course of a year.
The research marks the first time the impact of generative AI tools on work has been measured outside the lab.
The population for this study was Filipino call centre workers.
Compare this with GitHub’s Sep/2022 research into productivity using LLMs (Copilot based on GPT-3), where the population was computer programmers:
[Software] developers who used GitHub Copilot completed the task significantly faster–55% faster than the developers who didn’t use GitHub Copilot. Specifically, the developers using GitHub Copilot took on average 1 hour and 11 minutes to complete the task, while the developers who didn’t use GitHub Copilot took on average 2 hours and 41 minutes. These results are statistically significant (P=.0017) and the 95% confidence interval for the percentage speed gain is [21%, 89%].
A plethora of laptop models (25/Apr/2023)
I use the term ‘laptop model’ to refer to any model that can fit in RAM on a 2023 laptop! This includes LLaMA-based models like Alpaca, Dolly 2.0, BELLE, Vicuna, Koala, and Chimera.
You may have looked on as these were released this year, and you may have even downloaded some of them to try yourself on your local computer. While they are nowhere near as powerful as GPT-4 or PaLM, their size and accessibility is a huge leap in making language models available to people everywhere.
The table below is from the paper ‘Phoenix: Democratizing ChatGPT across Languages‘ by Chen et al, pp4, released 20/Apr/2023. There have been a few more models released in the 10 days since then!
And the figure below compares the ‘relative response quality’ of selected laptop models and dialogue models (like Bard and ChatGPT) as assessed by GPT-4 (pp11).
The Interesting Stuff
Datasets: training on code leads to reasoning ability (2022-2023)
Datasets are the big buckets of words used by AI labs to train models. Generally, they consist of web pages, books, articles, and Wikipedia. (Watch my 2-min video on this, or a longer version ‘for humans’.)
Researchers at Allen AI (with review by Google Brain) have noticed that including code in training datasets may be the cause of models learning to reason, especially via chain-of-thought (CoT).
We have concluded:
The ability to perform complex reasoning is likely to be from training on code.
If we consider the logic required to work through a programming language, it’s easy to see how beneficial this might be when applied to both human thought and daily living. For example, it may be that having a model step through a dataset with even a simple program in Pascal or BASIC or C (with its various functions and references) imitates some parts of the routines in our daily lives.
The paper is in very early draft (notes/outline) stage.
Datasets: Bigger and bigger (Apr/2023)
In a recent video I made an off-the-cuff remark that I’ve been ‘obsessed’ with datasets for a long time… it’s true!
My Mar/2022 paper What’s in my AI? A Comprehensive Analysis of Datasets Used to Train GPT-1, GPT-2, GPT-3, GPT-NeoX-20B, Megatron-11B, MT-NLG, and Gopher was well-received in academic, corporate, government, and intergovernmental circles.
Since that paper’s release, there have been a few more datasets to analyze.
Feb/2023: Meta AI’s LLaMA dataset with its 4TB of Common Crawl.
Mar/2023: OpenAI’s GPT-4 dataset designed by a team of 35 staff. No other information is available, but I’ve put together my best estimates on this dataset.
Apr/2023: Stability AI’s version of The Pile dataset, announced with their StableLM models, but no info has yet been released.
Apr/2023: Together AI’s RedPajama dataset.
This most recent one is interesting, and clones nearly exactly the LLaMA dataset:
RedPajama is more than double the size of the GPT-3 dataset (celebrating its 3rd anniversary in May/2023), but less than 10% of the size of GPT-4’s dataset using my estimates.
There is a seeming ‘duplication’ of web crawl data, using both a standard Common Crawl, as well as Google’s filtered version of the Common Crawl, C4. In effect, this means they have 20% of ‘clean’ common crawl (C4), and are then adding another 80% with ‘work to be done’ in the unfiltered Common Crawl. Work includes removing boilerplate and repeated text like footers, stripping out HTML, and more.
This open-source dataset contains 200GB of GitHub code (40% less than LLaMA), plus another 67GB of StackExchange code discussion. As highlighted above, allowing models to ‘see’ code during training may support complex reasoning abilities.
The dataset fits nicely in with the other recent releases, though it’s interesting to note just how much bigger the GPT-4 dataset is!
If you’ve wondered what exactly ChatGPT (and other models) know about you, you can search for your own name or other interesting data in this searchable index here: