Skip to content


Generating new data from an input involves selecting the next best token or sets of tokens given an output logit vector.

Contrastive Decoding

Demonstrates large improvements by using differences between better and worse models shows substantial improvement in generative quality.

Contrastive inference:

Any method which controls behavior differential at inference time, directly contrasting outputs from a desirable inference process with outputs from an undesirable inference process. --Sean Obrien

Contrastive Decoding Improves Reasoning in Large Language Models


Contrastive Decoding: Open-ended Text Generation as Optimization


GitHub Repo stars Dola: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper image

"(They) amplify the factual knowledge in an LM
through a contrastive decoding approach, where the output probability over the next word is obtained from
the difference in logits obtained from a higher layer versus a lower layer"

Speculative Sampling

Speculative sampling is a technique that relies on speedups due to generation parallelism to create k-next tokens samples to reduce latency. It starts by using a smaller model to generate a draft set of tokens. These are then run in parallel (instead of serial which is standard) to produce output logits. The draft and target-model tokens are compared and randomly sampled to allow the acceptance of the draft tokens or to generate a new token set.

Accelerating Large Language Model Decoding with Speculative Sampling


Joint decoding

GitHub Repo stars Co-LLM: Learning to Decode Collaboratively with Multiple Language Models

Developments The author show in their paper that the use of multiple models to improve generated content using the outputs of one as context for the others.