FOR IMMEDIATE RELEASE: 17/Sep/2023
Dr Sébastien Bubeck, Microsoft Research (7/Apr/2023):
There is some intelligence in this [GPT-4] system… Beware of the trillion-dimensional space. It's something which is very, very hard for us as human beings to grasp. There is a lot that you can do with a trillion parameters…
It could absolutely build an internal representation of the world, and act on it as the processing progresses through the layers and through the sentence temporally… We shouldn't think about those neural networks as learning a simple concept like ‘Paris is the capital of France.’ It's doing much more, like learning operators, it’s learning algorithms. It's not just retrieving information, not at all. It has built internal representations that allows it to reproduce the data that it has seen succinctly… Yes, it was trained just to predict the next word. But what emerged out of this is a lot more than just a statistical pattern-matching object.
Welcome back to The Memo.
You’re joining full subscribers from Harvard, Rice, Columbia, MIT, Cornell, Stanford, Brown, UC Berkeley, FSU, Princeton, and more…
This is another long edition. I don’t usually announce my keynotes here (nearly all of them are for private bodies), but I’m really looking forward to opening the next public event in Ukraine by Devoxx, ‘AI – Friend or Foe?’ My keynote is called ‘Superintelligence: No one is smart enough‘ and the recording will be made available to full subscribers. The next roundtable is on 23/Sep/2023, details at the end of this edition.
In the Toys to play with section, we look at a new prediction viz for GPT, a brilliant application of image models for your own family, and a new vision-language-action model for self-driving vehicles.
The BIG Stuff
Andreessen Horowitz: How Are Consumers Using Generative AI? (13/Sep/2023)
ChatGPT: estimated 200M monthly users, 1.6B monthly visits (Jun/2023).
ChatGPT: 24th most visited website globally.
80% of the top 50 AI products didn’t exist a year ago.
Read more: https://a16z.com/how-are-consumers-using-generative-ai/
Falcon 180B is the largest open(-ish) dense model right now (6/Sep/2023)
TII in Abu Dhabi released a 180B parameter version of Falcon, trained on 3.5T tokens (20:1). It is the largest and highest performing open dense model in the world right now, about twice as big as Llama 2. The definition of ‘open’ does not include full commercial use.
Read more: https://huggingface.co/blog/falcon-180b
Try the demo: https://huggingface.co/spaces/tiiuae/falcon-180b-demo
See Falcon 180B on the Models Table and Timeline.
Watch a replay of my Falcon 180B livestream.
Apple UniLM 34M + Apple Ajax 200B (7/Sep/2023)
Hacker Jack Cook discovered Apple’s newest LLM, a tiny Transformer model used for predictive text on-device in the next version of macOS (Sonoma 14.0), and the next version of iOS (iOS 17).
From my calculations based on sizes of each layer, Apple’s predictive text model appears to have about 34 million parameters, and it has a hidden size of 512 units. This makes it much smaller than even the smallest version of GPT-2.
The reason this one is so interesting to me is because Apple has 2B+ active devices out there (Feb/2023).
UniLM may be the very first on-device Transformer model, and it’s hitting a huge slice of the population.
Read more: https://jackcook.com/2023/09/08/predictive-text.html
Browse the repo: https://github.com/jackcook/predictive-spy
Apple’s most advanced LLM, known internally as Ajax GPT, has been trained on “more than 200 billion parameters” and is more powerful than OpenAI’s GPT-3.5…
See UniLM on the Models Table.
We covered some of this (UniLM and Ajax) in my recent livestream (replay).
Bonus: There were 9 major model announcements in the first two weeks of Sep/2023.
TinyLlama
Falcon 180B
FLM-101B
Persimmon-8B
UniLM
phi-1.5
NExT-GPT
MoLM
DeciLM
Read more (including playground and paper) on the Models Table and Timeline.
GPT-4 hits 99th percentile in creativity testing (25/Aug/2023)
The gold standard for testing creativity is the Torrance Tests of Creative Thinking, TTCT (wiki). The TTCT was designed to measure six sub-constructs of creativity and creative strengths: fluency, flexibility, originality, elaboration, titles, closure, and creative strength.
GPT-4 scored in the top 1% of test-takers for the originality of its ideas… Scholastic Testing Service is a private company, it does not share its prompts with the public. This ensured that GPT-4 would not have been able to scrape the internet for past prompts and their responses.
Read more: https://theconversation.com/ai-scores-in-the-top-percentile-of-creative-thinking-211598
I’ve updated my viz to include this important benchmark:
Stability AI releases Stable Audio (13/Sep/2023)
I keep thinking back to when we didn't have Stability AI, and it was just Google and Meta teasing us with mouth watering papers, but never letting us touch them. I'm so thankful Stability exists. (HN user, 13/Sep/2023)
Stable Audio is a latent diffusion model (like Stable Diffusion) trained on >800k [studio quality] sound files. A 907M parameter U-net powers Stable Audio.
Read release: https://stability.ai/research/stable-audio-efficient-timing-latent-diffusion
Try it out: https://stableaudio.com/
Exclusive: Microsoft argues about data quality with phi-1.5 (12/Sep/2023)
Microsoft’s latest LLM is a 1.3B-parameter model trained on 150B tokens, with performance comparable to models 5x larger.
The most interesting part of this entire piece is that Microsoft is arguing against ever-larger datasets, and proposing that dataset quality is more important.
It reminds me of my exploration of this topic more than two years ago, back in Jun/2021, with my paper Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond). Microsoft came to the same conclusion, finding that (I’m gonna bold the whole thing; it’s important!):
…the creation of a robust and comprehensive dataset demands more than raw computational power: It requires intricate iterations, strategic topic selection, and a deep understanding of knowledge gaps to ensure quality and diversity of the data. We speculate that the creation of synthetic datasets will become, in the near future, an important technical skill and a central topic of research in AI.
Read the paper: https://arxiv.org/abs/2309.05463
Watch the related video by Microsoft’s Prof Sébastien Bubeck.
See phi-1.5 on the Models Table.
Exclusive: Inflection ready to train a 1,000T parameter model (1/Sep/2023)