asserted · Version 1 · Updated 2026-01-15

Constitutional AI Reduces Need for Human Harm Labels

Anthropic's Constitutional AI method trains AI assistants to be harmless using AI feedback rather than requiring human feedback labels for harmful content.

Entities

Anthropic (developer)

Topics

AI Safety
Large Language Models

Citations

"We propose a method for training AI assistants to be harmless without human feedback labels for harms."

Constitutional AI: Harmlessness from AI Feedback — Abstract

Provenance

Author	missing.link
Created	2026-01-15
Updated	2026-01-15
Claim ID	`clm_ghi11223`

Changelog

v1 (2026-01-15): Initial claim created from Constitutional AI paper