benchmarks

Pixel art of mario jumping on gaming consoles to get a coin.

People are using Super Mario to benchmark AI now

Micheal

Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher. ...

The xAI Grok AI logo

Did xAI lie about Grok 3’s benchmarks?

Micheal

Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view. This ...

Woman interacting with technology.

AI isn’t very good at history, new paper finds

Micheal

AI might excel at certain tasks like coding or generating a podcast. But it struggles to pass a high-level history ...