SAN FRANCISCO, Nov. 20, 2025—Common Sense Media today released a comprehensive risk assessment finding that AI chatbots are fundamentally unsafe for teen mental health support. The research, conducted ...
Toolathlon is a benchmark to assess language agents' general tool use in realistic environments. It features 600+ diverse tools based on real-world software environments. Each task requires ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results