4 Comments

In their Gemini release (Dec 6), Google indicated that Gemini Ultra had achieved 90.0% on the MMLU benchmark using CoT@32. Most others are reporting the 5-shot approach as above. Google's release suggests that GPT-4, using CoT@32, scored 87.29%. Do we know what Claude 3 Opus scores using the CoT approach? Is this significant, as this would put Gemini Ultra (and maybe Claude 3 Opus) above the "human expert" at 89.8% on MMLU per your scale at lifearchitect.ai/gpt-4-5?

Expand full comment

Dr. Thomson! Alan! Have you seen this conversation with Claude 3? A really, really interesting deep dive into the phenomenology of being an AI! https://github.com/daveshap/Claude_Sentience/blob/main/conversation.md

Expand full comment

https://twitter.com/alexalbert__/status/1764722513014329620?t=snPcMP4s1Lc_pZrkdfxFDA&s=19 this is a very interesting output from a Claude 3 needle in the haystack prompt. Can we discuss on the livestream later? (DevonshireHillLad)

Expand full comment