Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3.0 struggles with BullshitBench, failing to reject nonsense over half the time. One AI ...
Stories about AI-generated fabrications in the professional world have become part of the background hum of life since generative AI hit the mainstream three years ago. Invented quotes, fake figures, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results