items/1073925/related \ ~hyperlinks

pull down to refresh

"Benchwashing" - how do you defend against this?

1748 sats \ 10 comments \ @optimism 9 Aug 2025 AskSN

related

ELI5 of Autonomous Testing antithesis.com/docs/resources/autonomous_testing/

828 sats \ 4 comments \ @k00b 20 May devs AI

Detecting and reducing scheming in AI models - OpenAI openai.com/index/detecting-and-reducing-scheming-in-ai-models/

257 sats \ 1 comment \ @Scoresby 17 Sep 2025 AI

Scholars sneaking phrases into papers to fool AI reviewers www.theregister.com/2025/07/07/scholars_try_to_fool_llm_reviewers/

300 sats \ 5 comments \ @0xbitcoiner 8 Jul 2025 AI

OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model www.searchenginejournal.com/openai-secretly-funded-frontiermath-benchmarking-dataset/537760/

441 sats \ 0 comments \ @frostdragon 21 Jan 2025 tech

I hacked ChatGPT and Google's AI – and it only took 20 minutes www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-minutes

775 sats \ 2 comments \ @StillStackinAfterAllTheseYears 19 Feb AI tech

Which AI Let You Down Today? - community tracker for llm performance decreases disaippointed.club/

423 sats \ 0 comments \ @Scoresby 20 Apr lol AI

Researchers poison stolen data to make AI results wrong www.theregister.com/2026/01/06/ai_data_pollution_defense/

266 sats \ 1 comment \ @0xbitcoiner 6 Jan AI

Plausibility is not truth | Robert Hutton | The Critic Magazine thecritic.co.uk/plausibility-is-not-truth/

580 sats \ 13 comments \ @Scoresby 22 Sep 2025 AI

Open Source AI Spam: Further Commentary

1570 sats \ 6 comments \ @halleck 10 May 2024 devs freebie

AI trained for treachery becomes the perfect agent - The Register www.theregister.com/2025/09/29/when_ai_is_trained_for/

257 sats \ 1 comment \ @Scoresby 30 Sep 2025 AI

Linus: Stop making issue of AI slop in kernel docs www.theregister.com/2026/01/08/linus_versus_llms_ai_slop_docs/

397 sats \ 0 comments \ @0xbitcoiner 8 Jan AI

How the novel nature of LLMs confounds our ability to evaluate them parsingphase.dev/tech/LLMs/psychologicalFactors.html

478 sats \ 0 comments \ @co574 24 Mar AI

Power to the people: How LLMs flip the script on technology diffusion karpathy.bearblog.dev/power-to-the-people/

247 sats \ 0 comments \ @billytheked 8 Apr 2025 AI

Michael Burry accuses Big Tech of billion-dollar fraud involving Nvidia chips.

820 sats \ 1 comment \ @economy 10 Nov 2025 Stacker_Stocks

AI benchmarks hampered by bad science www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/

208 sats \ 0 comments \ @0xbitcoiner 10 Nov 2025 AI

LLMs’ impact on science: Booming publications, stagnating quality arstechnica.com/science/2025/12/llms-impact-on-science-booming-publications-stagnating-quality/

268 sats \ 0 comments \ @0xbitcoiner 18 Dec 2025 science

US closes probe into Waymo self-driving collisions, unexpected behavior www.reuters.com/legal/litigation/us-closes-probe-into-waymo-self-driving-collisions-unexpected-behavior-2025-07-25

258 sats \ 0 comments \ @Coinsreporter 26 Jul 2025 Cartalk

Openwashing www.nytimes.com/2024/05/17/business/what-is-openwashing-ai.html

266 sats \ 7 comments \ @south_korea_ln 26 Oct 2024 tech

Kernel Devs Debate LLM Code Quality Concerns as AI-Generated Patches Increase biggo.com/news/202508240724_Kernel_Developers_Debate_LLM_Code_Quality

210 sats \ 0 comments \ @ch0k1 8 Mar AI devs

Small LLMs Can Beat Large Ones at 5-30x Lower Cost with Automated Data Curation www.tensorzero.com/blog/fine-tuned-small-llms-can-beat-large-ones-at-5-30x-lower-cost-with-programmatic-data-curation/

304 sats \ 1 comment \ @carter 5 Aug 2025 AI

GrapheneOS on Google's new reCAPTCHA approach x.com/GrapheneOS/status/2053507778719748465

543 sats \ 6 comments \ @Scoresby 11 May tech