CAIS Blog

Deeper-dive examinations of relevant AI safety topics

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
AI Risks
Sep 15, 2024
2 min read

CAIS and Scale AI are excited to announce the launch of Humanity's Last Exam, a project aimed at measuring how close we are to achieving expert-level AI systems. The exam is aimed at building the world's most difficult public AI benchmark gathering experts across all fields. People who submit successful questions will be invited as coauthors on the paper for the dataset and have a chance to win money from a $500,000 prize pool.

Written by:
Dan Hendrycks, Alexandr Wang
AI Risks
Sep 9, 2024
5 min read

This post describes a superhuman forecasting AI called FiveThirtyNine, which generates probabilistic predictions for any query by retrieving relevant information and reasoning through it. We explain how the system works, its performance compared to human forecasters, and its potential applications in improving decision-making and public discussions.

Written by:
Long Phan, Andrew Zeng, Mantas Mazeika, Adam Khoja, Dan Hendrycks
AI Risks
May 10, 2024

AI Safety, Ethics and Society is a textbook and online course providing a non-technical introduction to how current AI systems work, why many experts are concerned that continued advances in AI could pose severe societal-scale risks, and how society can manage and mitigate these risks.

Written by:
AI Risks
Apr 29, 2024
5 min read

Representation engineering is an exciting new field which explores how we can better understand traits like honesty, power seeking, and morality in LLMs. We show that these traits can be identified by looking at model activations, and these same traits can also be controlled. This method differs from mechanistic approaches which focus on bottom-up interpretations of node to node connections. In contrast, representation engineering looks at larger chunks of representations and higher-level mechanisms to understand models in a 'top-down' fashion.

Written by:
Izzy Barrass, Long Phan
AI Risks
Apr 10, 2024
43 min read

The internal dynamics of the ML field are not immediately obvious to the casual observer. This post will present some important high-level points that are critical to beginning to understand the field, and is meant as background for our later posts.

Written by:
Dan Hendrycks
Thomas Woodside
AI Risks
Mar 6, 2025
9 min read

Advances in AI could increase the risk of cyberattacks, yet AI also promises to improve cyber defenses. A coordinated effort between technology and regulatory sectors is crucial for leveraging AI's potential to strengthen cyber defenses and address security shortcomings.

Written by:
Steve Newman
AI Risks
Mar 6, 2024
5 min read

Written by:
Izzy Barrass, Adam Khoja, Oliver Zhang
AI Risks
Oct 29, 2024
8 min read

Metrics drive the ML field, but defining these metrics is difficult. Successful benchmarks aren't the inevitable result of annotating a large enough dataset. Instead, effective ML benchmarks produce clear evaluations, have minimal barriers to entry, and concretize an important phenomena.

Written by:
Dan Hendrycks
Thomas Woodside
AI Risks
Feb 8, 2024
7 min read

Advances in AI and DNA synthesis promise to revolutionize medicine… but could enable bioterrorism. A thoughtful mix of public health measures and restricted access to advanced capabilities can manage this risk while also alleviating natural viral threats.

Written by:
Steve Newman
AI Risks
Jul 21, 2023
2 min read

The Center for AI Safety state its support for the White House's securing of voluntary commitments from leading AI companies.

Written by:
AI Risks
Jun 4, 2023
2 min read

We highlight three regulatory suggestions – improved legal liability frameworks, increased scrutiny on the development cycle of AI products, and the importance of human oversight in high-risk AI systems – advocated by institutions like the AI Now Institute and the European Union.

Written by: