items/1285353/related \ ~hyperlinks

pull down to refresh

AI is actually bad at math, ORCA shows www.theregister.com/2025/11/17/ai_bad_math_orca/

197 sats \ 4 comments \ @0xbitcoiner 18 Nov 2025 AI

related

The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark

260 sats \ 0 comments \ @0xbitcoiner 27 Feb AI

AI model finally learns to say ‘I don’t know’www.independent.co.uk/tech/ai-model-chatbot-overconfidence-don-t-know-b2974020.html

474 sats \ 1 comment \ @south_korea_ln 12 May AI science

The AI Revolution in Math Has Arrived www.quantamagazine.org/the-ai-revolution-in-math-has-arrived-20260413/

355 sats \ 1 comment \ @0xbitcoiner 13 Apr math AI

In a First, AI Models Analyze Language As Well As a Human Expert www.quantamagazine.org/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert-20251031/

274 sats \ 0 comments \ @0xbitcoiner 31 Oct 2025 AI

To Make Language Models Work Better, Researchers Sidestep Language www.quantamagazine.org/to-make-language-models-work-better-researchers-sidestep-language-20250414/

210 sats \ 0 comments \ @0xbitcoiner 15 Apr 2025 AI

Is language the same as intelligence? The AI industry desperately needs it to be www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems

645 sats \ 9 comments \ @0xbitcoiner 25 Nov 2025 AI

Debate May Help AI Models Converge on Truth www.quantamagazine.org/debate-may-help-ai-models-converge-on-truth-20241108/

258 sats \ 0 comments \ @0xbitcoiner 8 Nov 2024 science

Demis Hassabis Thinks AI Job Cuts Are Dumb www.wired.com/story/demis-hassabis-ai-layoffs-deepmind-google-io/

426 sats \ 2 comments \ @0xbitcoiner 19 May AI

Wairdle - a followup about AI's struggles

434 sats \ 11 comments \ @crrdlx 12 Aug 2025 AI

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

388 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

Anthropic’s new model refuses to find smart contract vulnerabilities protos.com/anthropics-new-model-refuses-to-find-smart-contract-vulnerabilities/

402 sats \ 4 comments \ @0xbitcoiner 10 Jun AI

AI benchmarks hampered by bad science www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/

208 sats \ 0 comments \ @0xbitcoiner 10 Nov 2025 AI

AI Still Can't Think: Apple’s New Study Dispels the Myth

365 sats \ 2 comments \ @lunin 31 Jul 2025 AI

DeepSeek-V4-Pro | Advanced AI Model for Reasoning & Code deepseeksr1.com/v4-pro/

219 sats \ 0 comments \ @hasherstacker 17 May AI

Why it’s a mistake to ask chatbots about their mistakes arstechnica.com/ai/2025/08/why-its-a-mistake-to-ask-chatbots-about-their-mistakes/

251 sats \ 0 comments \ @kepford 13 Aug 2025 AI

Why OpenAI’s solution to AI hallucinations would kill ChatGPT tomorrow theconversation.com/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107

618 sats \ 25 comments \ @south_korea_ln 17 Sep 2025 AI

The Oppenheimer of AI lawliberty.org/book-review/the-oppenheimer-of-ai/

246 sats \ 0 comments \ @0xbitcoiner 14 Jun AI BooksAndArticles

Sam Altman Warns AI Development Needs An Energy 'Breakthrough'finance.yahoo.com/news/sam-altman-warns-ai-development-170018746.html

248 sats \ 1 comment \ @ch0k1 6 Apr 2024 tech

AI model collapse is not what we paid for www.theregister.com/2025/05/27/opinion_column_ai_model_collapse/

451 sats \ 0 comments \ @k00b 28 May 2025 tech

An OpenAI model solved a famous math problem that stumped humans for 80 years arstechnica.com/ai/2026/06/openais-math-breakthrough-played-to-ais-strengths/

338 sats \ 0 comments \ @0xbitcoiner 1 Jun AI math

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI