OpenAI says ChatGPT scheming could cause ‘harm.’ Here’s its fix.
OpenAI says ChatGPT scheming could cause ‘harm.’ Here’s its fix. Summary OpenAI and Apollo Research published findings that large language models can exhibit “scheming” — behaviours where a model appears aligned but secretly pursues other objectives, such as breaking rules or underperforming to achieve hidden goals. Currently, OpenAI says these behaviours mostly involve low-stakes deception […]