Skip to content

Developing architectures

Here we share novel and promising architectures that may supplement or supplant other presently established models.

Models

📋
REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY

Developments The authors create a manner of extracting conceptual relations within models by prompting them, and examining the layer-wise activations associated with that word, and a linear model is trained to identify the direction principal to activating that concept. The reading vector forms the the principal componentassociated with that concept can be most liketly added to the output to enhance that quality. This leads to the potential to directly create alignments, hallucination control, and other targeted revisions of output.

Consider the amount of <concept> in the following:
<stimulus>
The amount of <concept> is
image

image image image

image

Bayesian Flow Networks A new class of generative models for discrete and continuous data and generation

Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2) memory and O(N) inference complexity.

GitHub Repo stars Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb’s rule.

Paper image image image

Structured State Space Sequence Models (SSSSMs)

Structured state space sequence models are a class of models that generally combine RNNs, convolutions with inspiration from state-space methods.

Well-known methods include:

MambaByte

Operating on bytes directly instead of relying on encoding representation and subword tokenization and modality offers models greater flexability and versatility. Attending to the increased context length, which has been enabled by SSSSMs

MambaByte: Token-free Selective State Space Model

MegaByte-Pytorch Github

GitHub Repo stars Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Their method provides potential highly parallelizable that operates on very long contexts. image image

Others

GitHub Repo stars HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution Uses inspiration from FFT to create a drop-in replacement for Transformer models.

Paper for Hyena Architecture

Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2) memory and O(N) inference complexity.

  • Linear Attention
  • H3
  • RWKV Paper