The Memo - Special Edition - Google DeepMind Gemini pre-release report - 8/Sep/2023
Google DeepMind Gemini: A general specialist
FOR IMMEDIATE RELEASE: 8/Sep/2023
Welcome back to The Memo.
This is an exclusive report for full subscribers of The Memo. We detail Google and DeepMind model highlights, with a focus on the upcoming Gemini release. PDF download link at the end of this edition.
Notice: This is an independent report made available before the release of Google DeepMind Gemini. While this report is based on rigorous analysis of public and private information, some parts of the report rely on determined estimates. For this reason, the pre-release versions of this report should not be used for decision making. A final version will be clearly marked, and published after Gemini’s official announcement.
Abstract
Since Google’s discovery of the Transformer architecture in 2017, and subsequent release of their pre-trained transformer language model BERT in October 2018, training large language models (LLMs) has become a new space race, bringing humanity towards its largest evolutionary change yet: ‘superintelligence.’
Between 2020 and 2023, LLMs continued to be trained on increasingly larger datasets, by ever larger teams of data scientists, with compute now measured in the hundreds of millions of dollars.
The information synthesized here covers the progress made by Google and DeepMind, presenting as one company under the Alphabet umbrella in 2023, with a focus on the massive Gemini multimodal model due for release in the US fall (in 2023, this means between 23 September and 21 December).
Contents
1. Background
1.1. Etymology
1.2. Google DeepMind: Two archers with one target
1.3. Gemini personnel and resources
1.4. Large language models
1.5. Text-to-image and visual language models
1.6. The Alpha series of AI systems
1.7. Putting it together: LLM + VLM + Text-to-image + Expertise
2. Datasets
2.1. Datasets: Text: MassiveText multilingual
2.2. Datasets: Visual
2.3. Datasets: Special
3. Gemini capabilities and performance
3.1. Languages
3.2. Visual
3.3. IQ
4. Closing estimates
5. Implementing and applying Gemini
6. Conclusion
7. Further reading
8. Appendix