asserted · Version 1 · Updated 2026-01-15

Constitutional AI Reduces Need for Human Harm Labels

Anthropic's Constitutional AI method trains AI assistants to be harmless using AI feedback rather than requiring human feedback labels for harmful content.

Entities

Topics

Citations

"We propose a method for training AI assistants to be harmless without human feedback labels for harms."

Constitutional AI: Harmlessness from AI Feedback — Abstract

Provenance

Authormissing.link
Created2026-01-15
Updated2026-01-15
Claim IDclm_ghi11223

Changelog

v1 (2026-01-15): Initial claim created from Constitutional AI paper