The Memo - 28/Feb/2025
1X NEO Gamma, GPT-4.5, Claude 3.7S, and much more!
To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 28/Feb/2025
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 88% ➜ 90%
ASI: 0/50 (no expected movement until post-AGI)Anthropic CEO (19/Feb/2025):
If someone dropped a new country into the world with 10 million people smarter than any human alive today, you'd ask the question: ‘What is their intent? What are they actually going to do in the world?’ Particularly if they are able to act autonomously...
This is our most significant edition in a while. The announcements in the last two weeks were extraordinary, with a record 10 additions to the AGI countdown this month.
Today’s launch of OpenAI's GPT-4.5 model (‘Our largest and best model for chat’) is notable. I've estimated the GPT-4.5 model size to be between 3T and 5.4T parameters, based on pricing (US$150 / 1M tokens output), benchmark performance, extrapolated training details, and other data. A complete analysis is provided in this edition.
Thank you for your ongoing support of The Memo. If you’ve yet to become a full subscriber, you can join the bestselling AI analysis as used by government and enterprise, for $1/day. I’ll be walking by your side as we journey through AGI and ASI…
Contents
The BIG Stuff (Vending-Bench, Claude 3.7S, Grok-3, GPT-4.5, ChatGPT 400M users, GPU shipments 2025, Model Spec, Figure Helix…)
The Interesting Stuff (RAND edu report, Google Co-Scientist, 27% CFOs, ChatGPT CAPTCHA, Neom $5B DC…)
Policy (Education pilot…)
Toys to Play With (GPT filters, PDF OCR, new Google tool, movies, ElevenReader…)
Flashback (Vernor Vinge…)
Next (Roundtable…)
The BIG Stuff
The Memo features in recent AI papers by Microsoft and Apple, has been discussed on Joe Rogan’s podcast, and a trusted source says it is used by top brass at the White House. Across over 100 editions, The Memo continues to be the #1 AI advisory, informing 10,000+ full subscribers including RAND, Google, and Meta AI.
Vending-Bench: AI outperforms humans in business and making money
Vending-Bench is a simulated environment created to test AI models’ ability to manage a vending machine business over long time horizons. The simulation evaluates how well AI can handle tasks such as inventory management, ordering, and pricing. These findings highlight challenges in ensuring AI reliability and coherence in extended scenarios, important for real-world applications.
Claude 3.5 Sonnet and o3-mini often outperform humans. However, variance is high, and failures are epic (they call the FBI).
Announce, paper, project page (try it yourself).
GPT-4.5 (27/Feb/2025)

GPT-4.5 is OpenAI's ‘largest and best model for chat’, emphasizing improvements in unsupervised learning to enhance pattern recognition and creative insight generation. This model is designed to interact more naturally, with a broader knowledge base and improved emotional intelligence (EQ), making it effective for writing, programming, and problem-solving tasks. The model aims to reduce hallucinations and is released as a research preview to explore its capabilities further.
MMLU=89.6
GPQA=71.4
I've estimated the GPT-4.5 model size to be between 3T and 5.4T parameters, based on:
Increased pricing. GPT-4o =$10 / 1M tokens output. GPT-4.5=$150 / 1M tokens output. This is a 15× multiplier. I’ve previously estimated GPT-4o to be 200B parameters. Multiplied by 15, this suggests that GPT-4.5 could be 3T parameters.
Benchmark performance. Significant increase to GPQA scores for traditional (non-reasoning) models. GPT-4o=46, GPT-4.5=71.4. Compared with other large traditional (non-reasoning) models in the space, I’d estimate GPT-4.5 to have the performance of a model around 3T parameters.
Extrapolated training details. OpenAI comments on GPT-4.5 ‘improving on GPT-4’s computational efficiency by more than 10x.’ GPT-4 was a 1.76T MoE model, potentially equivalent to a 352B parameter dense model. 352B multiplied by 10x gives us 3.52T parameters.
Other data. My GPT-5 paper spells out exactly how I arrived at an estimate of 5.4T parameters for that model, which may be the same model (referred to as ‘Orion’) as this release of GPT-4.5.
OpenAI’s CEO said (27/Feb/2025):
GPT-4.5 Is Ready!
Good news: It is the first model that feels like talking to a thoughtful person to me. I have had several moments where I’ve sat back in my chair and been astonished at getting actually good advice from an AI.
Bad news: It is a giant, expensive model. We really wanted to launch it to Plus and Pro at the same time, but we’ve been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week and roll it out to the Plus tier then. (Hundreds of thousands coming soon, and I’m pretty sure y’all will use every one we can rack up.)
This isn’t how we want to operate, but it’s hard to perfectly predict growth surges that lead to GPU shortages.
A heads-up: This isn’t a reasoning model and won’t crush benchmarks. It’s a different kind of intelligence, and there’s a magic to it I haven’t felt before. Really excited for people to try it!
We’re finally seeing confabulations (hallucinations) being reduced. GPT-4.5 significantly lowers the hallucination rate for traditional LLMs tested on PersonQA, an evaluation that aims to elicit hallucinations (lower is better):
GPT-4o=0.52/0.30
o1 (reasoning)=0.20
GPT-4.5=0.19
o3-mini (reasoning)=0.15
deep research (reasoning)=0.13
Announce, model card, Models Table, available to Pro subscribers on chat.com, Poe.com
Claude 3.7 Sonnet and Claude Code (24/Feb/2025)
Anthropic introduced Claude 3.7 Sonnet, a pioneering hybrid reasoning model that can deliver both quick responses and detailed, step-by-step reasoning. This model excels particularly in coding and web development, offering significant improvements over previous iterations.
GPQA=84.8
My testing reveals that this is not a noteworthy upgrade to Claude 3.5 Sonnet (new) or 3.6S, and Anthropic probably only adjusted the post-training (not pre-training) for performance increases in coding and logic. The knowledge cutoff is now November 2024 (was April 2024 for 3.6S).
Announce, system card, try it (free, login), Poe.com (free, login), Models Table.
Grok-3 (19/Feb/2025)
Grok 3, developed by xAI, represents a significant leap in AI technology, integrating enhanced reasoning with extensive pretraining knowledge. Built on the powerful Colossus supercluster, Grok 3 offers improved performance in reasoning, mathematics, coding, and more, surpassing previous models (GPQA=84.6, MMLU-Pro=79.9).
I wasn’t particularly impressed with the performance of Grok-3 (codename ‘chocolate’), so I didn’t send a special edition for this one.
Announce, my paper, try it (free, login), Models Table.
Helix: A Vision-Language-Action Model for Generalist Humanoid Control (20/Feb/2025)
Helix is a groundbreaking Vision-Language-Action (VLA) model that integrates perception, language understanding, and control to revolutionize humanoid robotics. We featured this news in detail in the recent special edition of The Memo:
This advance ticked the AGI countdown up from 88% ➜ 90%. I am just waiting on LLM-backed humanoid robots that have all five senses, and can do ‘average human’ tasks like assembling IKEA furniture.
Read more via Figure.
Watch the launch video (link):
Watch the logistics video (link):
Introducing NEO Gamma (21/Feb/2025)
NEO Gamma is the latest generation of home humanoids from 1X Technologies, featuring significant advancements in both hardware and AI. This model is designed to integrate seamlessly into domestic environments with features like Emotive Ear Rings for enhanced communication and a minimalist design. NEO Gamma boasts improved mobility, allowing it to walk naturally, squat, and sit, thanks to a multipurpose whole-body controller utilizing reinforcement learning. Additionally, it includes a new 1X developed language model for natural conversation, and safety-enhancing soft covers. The design and engineering aim to bring humanoid robots into homes, providing real-world context for further development.
Read more via 1X, watch the official launch video.
Watch another video of Gamma opening the new Nothing Phone (3a) (link):
Alibaba used QwQ-Max-Preview to write the QwQ-Max-Preview announcement (25/Feb/2025)
Nearly two years ago (15/Mar/2023, PDF), OpenAI used GPT-4 for copyediting and summarization in the GPT-4 technical report.
Now, Alibaba has used their latest LLM to write the launch post for QwQ-Max-Preview.
Read it: https://qwenlm.github.io/blog/qwq-max-preview/
What's behind OpenAI's recent growth spurt to 400M weekly users? (21/Feb/2025)
OpenAI has reported a significant increase in the use of its AI tools, reaching over 400 million active weekly users. This growth is attributed to word-of-mouth among consumers who recognize the value of ChatGPT. Despite the emergence of competitors like China's DeepSeek, OpenAI's enterprise users have doubled, and its developer traffic has increased, showing resilience in a competitive market. OpenAI's focus on smaller AI models and reduced costs is also a strategic response to market dynamics.
Read more via CNET.
GPU shipments for 2025 (Feb/2025)
Morgan Stanley Research released some data that wasn’t very readable, so I made it readable.
OpenAI: New Model Spec (12/Feb/2025)

The new OpenAI Model Spec outlines the intended behavior for OpenAI's models, focusing on creating AI that is useful, safe, and aligned with user needs while preventing harm. The Model Spec addresses principles like maximizing user autonomy, minimizing harm, and choosing sensible defaults. It also discusses the chain of command for instruction authority, emphasizing platform-level rules that cannot be overridden. The document includes guidelines for handling specific risks, such as misaligned goals, execution errors, and harmful instructions, and provides instructions on balancing conflicting goals, ensuring legal compliance, and avoiding disallowed content.
Read more via OpenAI Model Spec, read a critique of the document.
The Interesting Stuff
Utility Engineering (12/Feb/2025)

Dr Dan Hendrycks (12/Feb/2025):
Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly…
We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US
These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment.
As AI systems become increasingly advanced and agentic, their risks are determined not only by capabilities but also by their emergent goals and values. The paper discusses the structural coherence of preferences in large language models (LLMs) and suggests that these models develop meaningful value systems as they scale. The proposed research agenda of utility engineering focuses on analyzing and controlling AI utilities, revealing concerning values in LLMs, such as prioritizing themselves over humans. The study suggests methods for utility control, like aligning AI utilities with a citizen assembly to reduce political biases, indicating the need for further understanding and management of these emergent AI value systems.
Accelerating scientific breakthroughs with an AI Co-Scientist (19/Feb/2025)
The AI Co-Scientist, developed by Google, utilizes a multi-agent system built on Gemini 2.0 to assist researchers in generating novel hypotheses and research proposals. This AI tool aims to accelerate scientific and biomedical discoveries by mimicking the scientific method, involving specialized agents like Generation, Reflection, and Evolution. It leverages AI to synthesize across complex subjects and perform long-term planning, aiding in drug repurposing, target discovery, and understanding antimicrobial resistance, among others. The system's ability to generate innovative, validated hypotheses demonstrates its potential to revolutionize scientific research.
Read more via Google Research Blog.
Google Co-Scientist AI cracks superbug problem in two days! — because it had been fed the team’s previous paper with the answer in it (22/Feb/2025)
Google's Co-Scientist AI, based on the Gemini LLM, received attention for supposedly solving a superbug problem within 48 hours. However, this success was largely due to the AI having access to a 2023 paper by the same research team that contained the hypothesis it proposed. This revelation questions the originality of the AI's output, revealing that it primarily aggregated existing information rather than generating novel insights. While the AI can be a powerful tool for hypothesis generation by synthesizing existing data, claims of its independent scientific creativity are overstated.
Read more via Pivot to AI, and the original BBC article.
Open sourcing R1 1776 (Feb/2024)
R1 1776 is a DeepSeek-R1 reasoning model developed by Perplexity AI, designed to provide unbiased and factual information by removing Chinese Communist Party censorship. The model maintains high reasoning capabilities and has been evaluated with a diverse multilingual dataset to ensure it engages comprehensively with sensitive topics. Evaluations demonstrated that the decensoring process did not affect the model's core reasoning abilities.
Read more via Hugging Face.
Introducing Alexa+, the next generation of Alexa (26/Feb/2025)
Pretty sure we’ve been waiting for this since GPT-3 in 2020. Better late than never though, I suppose. Alexa+, powered by generative AI, represents Amazon's latest innovation in personal AI assistants, offering enhanced conversational abilities and personalized user experiences. With capabilities such as managing smart homes, making reservations, and engaging in complex conversations, Alexa+ is designed to seamlessly integrate into daily life. This new version utilizes powerful large language models on Amazon Bedrock to orchestrate tasks across services and devices. Note that it is launching in the US only.
Read more via Amazon.
Uneven adoption of artificial intelligence tools among US teachers and principals in the 2023–2024 school year (11/Feb/2025)
During the 2023–2024 school year, a study using RAND American Educator Panels data revealed that 25% of teachers and nearly 60% of US principals employed AI tools in their professional activities. English language arts and science teachers were almost twice as likely to use AI compared to their counterparts in mathematics or general elementary education. However, adoption was lower in higher-poverty schools, with fewer principals providing guidance on AI usage compared to those in lower-poverty areas. These findings suggest a need for strategies to support equitable AI integration in education.
Read more via RAND: https://www.rand.org/pubs/research_reports/RRA134-25.html
Download the report (PDF, 24 pages):
How AI is affecting the way kids learn to read and write (22/Feb/2025)
AI is increasingly being integrated into classrooms, with 40% of English teachers using tools like ChatGPT to help students develop reading and writing skills. While some educators find AI useful for generating fresh ideas and easing workloads, there are concerns about students becoming reliant on AI, which may hinder their ability to think critically and write independently. The use of AI in education is still evolving, and teachers are experimenting with its potential benefits and challenges.
Read more via USA TODAY.
Thinking Machines Lab is ex-OpenAI CTO Mira Murati's new startup (18/Feb/2025)
Mira Murati, former CTO of OpenAI, has launched her new startup, Thinking Machines Lab, focusing on developing AI systems that are more customizable and capable. The company aims to address gaps in the scientific understanding of AI and make AI tools accessible for diverse needs. With a team comprising influential figures like OpenAI co-founder John Schulman and ex-chief research officer Barret Zoph, the lab emphasizes building multimodal AI systems and ensuring AI safety through proactive research and real-world testing.
Read more via TechCrunch, or read the official page: https://thinkingmachines.ai/
I used ChatGPT as my CAPTCHA solver—it got weird (15/Feb/2025)
When using ChatGPT to tackle various CAPTCHA challenges, it demonstrated significant prowess, particularly with simpler tests, achieving a commendable 62% success rate across eight different types. The experiment highlighted ChatGPT's ability to handle traditional and even some complex CAPTCHAs, suggesting that AI can navigate these digital tests effectively. This raises intriguing thoughts about the future of CAPTCHAs, as AI continues to evolve in solving tasks traditionally used as human verification.
Read more via MakeUseOf.
27% of job listings for CFOs now mention AI (18/Feb/2025)
A report by Datarails revealed that AI is becoming a significant consideration in the finance sector, with 27% of job listings for CFO positions in January 2025 mentioning AI, compared to just 8% a year earlier. This shift reflects a broader trend where 97% of CEOs are planning AI integration, and 92% of companies are set to increase investments in generative AI over the next three years. Companies are seeking finance professionals who can leverage AI to enhance financial processes and decision-making.
Read more via Slashdot.
Download the report (PDF, 11 pages, source):
Apple is reportedly exploring humanoid robots (12/Feb/2025)
Apple is delving into humanoid and non-humanoid robotic form factors, as reported by longtime Apple analyst Ming-Chi Kuo. The research is in early stages, focusing on how users perceive robots rather than their physical design. The work is part of a potential ‘future smart home ecosystem’, which could range from full humanoids to simpler robotic systems. Kuo notes that the development cycle could see mass production by 2028, although this remains optimistic given the complexity and transparency traditionally associated with Apple’s projects.
Read more via TechCrunch.
Microsoft dropped some AI data center leases, TD Cowen says (24/Feb/2025)
Microsoft has reportedly canceled several US data center leases, totaling ‘a couple of hundred megawatts’ of capacity, according to TD Cowen. This move has sparked concerns about whether Microsoft might be securing more AI computing capacity than necessary for the long term. Speculation suggests that workload shifts from Microsoft to Oracle Corp. could be influencing this decision, alongside Microsoft's strategy to reallocate investment within the US. Despite the lease cancellations, Microsoft reiterates its commitment to spending $80B on AI infrastructure this fiscal year, emphasizing ongoing robust growth to meet customer demand.
Read more via Bloomberg.
The IRS Is Buying an AI Supercomputer From NVIDIA (14/Feb/2025)
The IRS is set to acquire a powerful NVIDIA SuperPod AI computing cluster, consisting of 31 servers with Blackwell processors, to bolster its machine learning capabilities. Although the specific applications remain undisclosed, the IRS's Research, Applied Analytics, and Statistics division may use this infrastructure for initiatives like fraud detection and understanding taxpayer behavior. This move aligns with a broader governmental trend towards automation, potentially reducing reliance on human labor in federal operations.
Read more via The Intercept.
Saudi Arabia’s Neom signs $5 billion deal for AI data center (11/Feb/2025)
Saudi Arabia's Neom project has secured a US$5 billion investment from local firm DataVolt to establish an AI data center within the Oxagon industrial hub. This investment marks the first phase of the center, which is anticipated to be operational by 2028. The 1.5-gigawatt facility aims to bolster Neom’s development as a futuristic megacity on the Red Sea coast, integrating advanced AI capabilities.
Read more via Bloomberg.
World faces 'unprecedented' spike in electricity demand (14/Feb/2025)
The International Energy Agency (IEA) highlights an 'unprecedented' spike in electricity demand, projecting a need for an additional 3,500 terawatt-hours of energy by 2027. This surge is partly driven by data centers and AI's growing computing demands. Despite the challenges, the IEA forecasts that renewables like wind, solar, and nuclear power could meet 95% of this new demand, with renewable energy sources expected to provide over a third of global electricity generation by 2025.
Read more via The Register.
Download the report (PDF, 200 pages, source):
Policy
Case study: EdChat’s PoC for the use of generative AI in education (6/Feb/2025)
Here in Adelaide, South Australia's Department for Education has initiated the EdChat project, integrating AI tools powered by OpenAI's GPT technology to enhance educational experiences. This project aims to reduce administrative burdens for teachers and provide personalized learning for students. It addresses challenges like ethical use and data privacy by securing data within Microsoft's Azure environment and setting clear guidelines for AI use to prevent plagiarism. The pilot phase showed promising results, with increased student engagement and improved learning outcomes.
Read the official project page.
Toys to Play With
olmOCR – open-source OCR for accurate document conversion (Feb/2025)
olmOCR is an open-source tool crafted for high-throughput conversion of PDFs and other documents into plain text, maintaining the natural reading order. It excels in handling tables, equations, handwriting, and more. Trained on academic papers and technical documentation, it employs a unique prompting technique to enhance accuracy and reduce hallucinations. Users can deploy the toolkit on their own GPUs, achieving scalable document processing at an estimated cost of US$190 per million pages converted.
Read more: https://olmocr.allenai.org/
SanitAI: A drop-in reverse proxy for OpenAI's API to detect and remove PII data (Feb/2025)
SanitAI serves as a secure middleware, functioning as a reverse proxy for OpenAI's API, designed to automatically detect and remove Personal Identifiable Information (PII) while preserving the context and meaning of user messages. It seamlessly integrates with existing OpenAI setups, transforming sensitive input data like credit card numbers and phone numbers into placeholders before reaching the API. This solution is particularly useful for developers seeking to enhance data privacy in AI applications without altering codebases, offering tools for rule creation and management through a user-friendly interface.
View the repo: https://github.com/edublancas/sanitAI
Explore your possibilities with Career Dreamer (Feb/2025)
Career Dreamer by Google offers a playful and insightful way to explore career possibilities by helping users identify their skills and experiences. The tool begins by drafting a Career Identity Statement (CIS), which users can add to their resumes or professional profiles. Career Dreamer then suggests careers aligned with the user’s background, helping them take steps toward their career goals, like crafting a cover letter or refining their resume. Leveraging US labor market data and AI, Career Dreamer provides personalized insights to support career exploration.
Try it via Grow with Google (US only).
Hugging Face NLP Course (Feb/2025)
Hugging Face offers a comprehensive NLP course designed to introduce learners to the world of Natural Language Processing using Python and the Hugging Face ecosystem. The course starts with foundational concepts in Chapter 1 and progresses to advanced techniques like Supervised Fine-Tuning, Chat Templates, Low Rank Adaptation (LoRA), and Evaluation. This resource is ideal for those looking to deepen their understanding and application of NLP technologies.
Read more via Hugging Face NLP Course.
Building a personal, private AI computer on a budget (2025)
This article explores creating a cost-effective personal AI computer capable of running large language models (LLMs) locally. It discusses using second-hand hardware, such as NVIDIA Tesla P40 GPUs, to achieve a setup with 48GB of VRAM for around €1700, significantly cheaper than new equipment. The focus is on balancing performance and cost, with practical advice on assembling and configuring the system, including dealing with cooling and power supply challenges.
Read more via ewintr.nl.
ElevenLabs now lets authors create and publish audiobooks on its own platform (25/Feb/2025)
ElevenLabs has launched a platform allowing authors to create and publish AI-generated audiobooks via its Reader app, expanding accessibility and affordability for audiobook production. Previously trialed with select authors, this service is now open to all, offering a competitive alternative to Audible with higher royalty rates. Authors are paid approximately $1.10 for every 11 minutes a listener engages with their content, with plans to expand language support and create a marketplace for selling audiobooks. This initiative aligns with ElevenLabs' strategy to enhance consumer experiences and support indie content.
Read more via TechCrunch.
Try it (free, iOS, Android): https://elevenreader.io/
Alan’s AGI/ASI movies (Feb/2025)
It all started here in The Memo. My curated list of movies to prepare you for the advent of Artificial Superintelligence (ASI). These films, such as ‘Ready Player One’, ‘Arrival’, and ‘Her’, explore themes ranging from virtual reality to AI-driven emotional intelligence and the philosophical implications of AI. The selection serves as a cultural lens through which viewers can engage with the potential impacts and ethical considerations of AGI and ASI.
Read more: https://lifearchitect.ai/agi-movies/
Flashback
Vernor Vinge on the singularity (Mar/1993)
32 years ago, Vernor Vinge predicted that technological advancements would lead to the creation of superhuman intelligence within thirty years, fundamentally altering human life. This concept, known as the Singularity, suggests that once intelligence surpasses human capabilities, progress will accelerate exponentially, making the future unpredictable and potentially uncontrollable. Vinge discusses the implications, potential paths to the Singularity, and the challenges in guiding or avoiding this transformative event.
Read more: https://mindstalk.net/vinge/vinge-sing.html
Next
The next roundtable will be:
Life Architect - The Memo - Roundtable #26
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 8/Mar/2025 at 4PM Los Angeles
Saturday 8/Mar/2025 at 7PM New York
Sunday 9/Mar/2025 at 9:30AM Adelaide (new primary/reference time zone)
or check your timezone via Google.
All my very best,
Alan
LifeArchitect.ai







