Constitutional AI is basically Anthropic's way of saying, "let's not trust humans to fine-tune this thing." Instead of having a bunch of people rate outputs with thumbs up/thumbs down (like OpenAI does with RLHF), they wrote a list of principles—like "don't be harmful," "be honest," "don't make stuff up"—and trained Claude to judge itself based on those rules.
It's kind of like giving the model its own moral compass, then telling it to figure things out from there.
In theory, it's cleaner and scales better—because you don't need thousands of people labeling everything—but in practice? It makes Claude super careful. Like... sometimes too careful. You'll ask a legit dev question and it'll go full safety mode and tell you why it can't help before it even tries. It's polite to the point of being annoying.
That said, I respect what they're doing. It's a swing at building AI that's less likely to hallucinate dangerous nonsense or confidently give you the wrong answer. But if you're just trying to get stuff done, it can be kind of a buzzkill.
So yeah—cool idea, interesting alignment strategy, but you definitely feel it when you're trying to move fast.