Skip to content

ManaGen🔮AI

Alignment and exestential concerns

Alignment and exestential concerns

There is a notable degree of concern for the potential for Generative, and eventually General AI, to cause harm. The harm can occur either accidentally or to the intentional use of GenAI.

There is also self-existential concerns related to GenAI models themselves. This is found due to the potential that when models are trained on data that is produced by other models, there can be a degradation in performance, known as model collapse.

Background¶

Jail breaking¶

Prompting¶

Fine-tune compromising¶

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! reveals that a few adversarial examples can break alignment when finetuned.

Alignment with People¶

Personal Universes: A Solutiont to the Multi-Agent Value Alignment Problem

Alignment with GenAI¶