New research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment ...
Benchmark bonds set performance standards for other bonds. This article covers their definition, operation, and examples that illustrate their market significance..
On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity's Last Exam (HLE), a new academic benchmark aiming to "test the limits of AI knowledge at the frontiers of human expertise," ...
The new benchmark, called Elephant, makes it easier to spot when AI models are being overly sycophantic—but there’s no current fix. Back in April, OpenAI announced it was rolling back an update to its ...
AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A ...
As more AI models show evidence of being able to deceive their creators, researchers from the Center for AI Safety and Scale AI have developed a first-of-its-kind lie detector. On Wednesday, the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results