Skip to content


Embeddings compress a string of tokens into a high-dimensional representation. They are preferably contextually aware, meaning different strings of tokens will have a different embedding.

Embeddings are can be used used to generate the next-expected token, evaluating text similarities, and with the similarity identification a way to do search is necessary in RAG

Embeddings are generally depend on the tokenization methods.

graph LR
    Text --> Token
    Token --> C[Token Embedding]
    C --> D[Sequence Embedding]
    D --> E[Changeable LLM]

    subgraph Embedding["Embedding Model"]

In order to separate the representation, allowing greater freedom in evaluating downstream architectures and permitting enduring lookup ability with RAG, these models can be part of a larger and more complex models for sequence generation.

Text and Code Embeddings by Contrastrive Pre-Training

The authors demonstrate using contrastive pre-training can yield high-quality vector representations of text and code.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks


GitHub Repo stars Matryoshka Representation Learning

The authors demonstrate MLR, which can encode information at different granularities allowing a single embedding to be be used for different downstream tasks. image

GitHub Repo stars ELE Embeddings

ELE Provides spherical embeddings based on descriptional logic. This allows for representation which works nicely with knoelged-graphs and ontologies.



GitHub Repo stars Fastembed with qdrant

Light & Fast embedding model

Quantized model weights
ONNX Runtime, no PyTorch dependency
CPU-first design
Data-parallelism for encoding of large datasets

Better than OpenAI Ada-002
Default is Flag Embedding, which is top of the MTEB leaderboard
List of supported models - including multilingual models


GitHub Repo stars Massive Text Embedding Benchmark


Blogs and posts