
Quality Diversity through AI Feedback
Language models carry implicit distributional biases based on their training data, which can reinforce existing norms. In this work, we take one step towards addressing the challenge of unwanted biases by enabling language models to return outputs with a broader spectrum of attribute traits, specified by a user. This is achieved by asking language models to evaluate and modify their outputs.
Read more