Skip to content

Alignment and exestential concerns

There is a notable degree of concern for the potential for Generative, and eventually General AI, to cause harm. The harm can occur either accidentally or to the intentional use of GenAI.

There is also self-existential concerns related to GenAI models themselves. This is found due to the potential that when models are trained on data that is produced by other models, there can be a degradation in performance, known as model collapse.

Background

Jail breaking

Prompting

Fine-tune compromising

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! reveals that a few adversarial examples can break alignment when finetuned.

Alignment with People

Alignment with GenAI