Constitutional AI Reduces Need for Human Harm Labels
Anthropic's Constitutional AI method trains AI assistants to be harmless using AI feedback rather than requiring human feedback labels for harmful content.
Entities
- Anthropic (developer)
Topics
- AI Safety
- Large Language Models
Citations
"We propose a method for training AI assistants to be harmless without human feedback labels for harms."
Provenance
| Author | missing.link |
| Created | 2026-01-15 |
| Updated | 2026-01-15 |
| Claim ID | clm_ghi11223 |
Changelog
v1 (2026-01-15): Initial claim created from Constitutional AI paper