Index/Topics/Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF)

A technique used to align AI systems with human values, criticized for being a cosmetic fix that masks deeper unpredictabilities.