Constitutional AI: Harmlessness from AI Feedback
| Publisher | Anthropic |
| Author | Yuntao Bai et al. |
| URL | https://arxiv.org/abs/2212.08073 |
| Access date | 2026-01-15 |
| Published | 2022-12-15 |
| Source ID | src_i9j0k1l2 |
Excerpt
We propose a method for training AI assistants to be harmless without human feedback labels for harms. The method involves both supervised learning and reinforcement learning from AI feedback.