Skip to content

Generation

Generating new data from an input involves selecting the next best token or sets of tokens given an output logit vector.

Contrastive Decoding

Demonstrates large improvements by using differences between better and worse models shows substantial improvement in generative quality.

Contrastive inference:

Any method which controls behavior differential at inference time, directly contrasting outputs from a desirable inference process with outputs from an undesirable inference process. --Sean Obrien

Contrastive Decoding Improves Reasoning in Large Language Models

image

Contrastive Decoding: Open-ended Text Generation as Optimization

image

GitHub Repo stars Dola: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper image

"(They) amplify the factual knowledge in an LM
through a contrastive decoding approach, where the output probability over the next word is obtained from
the difference in logits obtained from a higher layer versus a lower layer"
image

Speculative Sampling

Speculative sampling is a technique that relies on speedups due to generation parallelism to create k-next tokens samples to reduce latency. It starts by using a smaller model to generate a draft set of tokens. These are then run in parallel (instead of serial which is standard) to produce output logits. The draft and target-model tokens are compared and randomly sampled to allow the acceptance of the draft tokens or to generate a new token set.

Accelerating Large Language Model Decoding with Speculative Sampling

image

Joint decoding

📋
GitHub Repo stars Co-LLM: Learning to Decode Collaboratively with Multiple Language Models

Developments The author show in their paper that the use of multiple models to improve generated content using the outputs of one as context for the others.

image

image