Architectures¶

Here we will discuss the architectural components needed to build Gen()AI models. While it is often useful or essential to use pre-trained models, it is likely that such pre-trained models can be further refined for specific use-cases.

tl;dr

Understand self-supervised learning and foundation models
Learn about models
Train your models
Evaluate and compare your models
Optimize your models
Generate with your models

Background¶

There is a rich history of Generative AI architectures, which will be shared in future versions of this code.

Of primary importance is the manner of model learning, or adapting to the input data. There are several fundamental types of model-updating: supervised learning, unsupervisedlearning, semi-supervised learning, self-supervised learning, reinforcement learning (RL), and combinations of thereof.

Presently, the most successful models rely on foundation models that are trained on large corpora of data in a self-supervised manner. These models can then be refined using supervised, semi-supervised, and/or reinforcement learning techniques.

Once built, Gen()AI is generally called with language inputs to create a specifically desired end result. These inputs, known as prompts will generally be model-specific but may sometimes share commonalities for more optimal usage, which we describe in prompt engineering.

Foundation Models¶

Foundation models are large-scale models that are pre-trained with self or semi-supervision on vast amounts of data and can be fine-tuned for specific tasks. These models serve as a foundation or base for various applications, reducing the need to train models from scratch.

Foundation models

Foundation models, by their nature, will continually expand in scope and potential. We share some seminal papers on foundation models here.

Continual evolution of models may be found in hubs such as Hugging Face.

Model Learning¶

There are several fundamental ways that models can 'learn' in relation to how data interacts with the model.

To Compress or Not to Compress provides a coherent understanding of different manners of learning in relation to information theory.

Self-supervised learning¶

Self-supervision amounts to using a single data entry to train a model to predict a portion of the data itself. For instance, a model that is used to predict the next word in a string of text or a model that is used to generate a piece of an image that has been blanked out. This approach has proven to be highly effective, especially for tasks where labeled data is expensive to obtain or otherwise scarce.

Supervised learning¶

Supervised learning is a more traditional ML approach that generally involves predicting the association between an input and an output variable. While generally quite powerful, supervised learning can be limited by the volume and cost of obtaining quality 'labeled' data, where inputs and outputs are associated with a high degree of veracity.

Unsupervised learning¶

Unsupervised learning is often used for discovering insights and patterns in the way data is distributed or related. While not directly or consistently used in GenAI systems, it can be valuable for filtering and selecting data.

Reinforcement learning¶

Generally originating from game-play and robotics, reinforcement learning offers the capacity for models to interact with a generally more complex environment. When combined with self-supervision, reinforcement learning has proven to be essential to create powerful GPT architectures.

Hybrid learning methods¶

Hybrid Learning methods combine one or several methods above to enable more successful Generative AI. Semi-supervised learning is a form of hybrid learning where supervised and unsupervised learning are used to produce the final outcome.

General Pretrained Transformer models (GPT) work this way by first doing unsupervised prediction. Then some supervised training is provided. Then an RL approach is used to create a loss model using reinforcment Learning with Human Feedback (RLHF) to score multiple potential outputs to provide more effective outputs.

Particular types of RLHF, like instruction-training of Instruct GPT enables models to perform effectively.

Language Models and LLMs¶

Language models (LMs) are a type of generative model trained to predict the next word in a sequence, given the previous words. They capture the statistical properties of language and can generate coherent and contextually relevant sentences.

Large Language Models (LLMs) are a subset of language models that are trained on vast amounts of text data. Due to their size and the diversity of data they're trained on, LLMs can understand and generate a wide range of textual content, from prose and poetry to code and beyond.

Challenges and Applications of Large Language Models Kaddour et al This is a well-done and comprehensive review.

GPT architectures¶

Illustrated GPT
How GPT3 works Excellent summary of the progress of GPT over time, revealing core components, optimizations, and essential variations to the major Foundation model architectures.
Five years of progress in GPTs
The Transformer Architecture of GPT Models

https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

Generative AI models are of two general categories: self-supervised, and Externally-supervised, and hybrid models.

Model Classes¶

Different model classes of models can often be used with multiple types of model learning.

Quality References¶

A Survey of Large Language Models A very comprehensive paper discussing LLM technology.
Understanding Large Language Models
What we know about LLMS (primer)
Catching up on the weird world of LLMs
LLM Engineering by Huyen Chip
A Survey of Large Language Models A very comprehensive paper discussing LLM technology.
A cookbook of self-supervised Learning
LLM Survey
Large Language Models Explained