Skip to content

Cognitive architecture

A cognitive architecture is a higher-level orchestration of individual interactions with input, LLMs, Memory, and Inputs. They are can be focused on both simple and complex tasks.

One input call to an LLM output produces output(s) based on their input prompts. Cognitive architectures, sometimes also considered chains allow for richer and more valuable outputs by connecting inputs + outputs with other components. These components may process GenAI output, enable the execution of actions and tools, and interact with memory in different forms of [#environments]. Chains can build more complex and integrated systems to enable higher-quality reasoning and results.

Biological Connectionism and Cognitive Architecture considered design system with a connection of a large number but highly connected units to facilitate computational-like behavior seen from Animals. For Gen()AI, however, cognitive architectures can be constructed in more linear chains, as in the case of the LLM-enabled chat, or more complex branching graph chains, which have been shown to increase performance.

Aspects of in Cognitive Architectures


  • Rephrasing or reformatting the input in such a way that the next
  • Observing or ingesting, intentionally or passively, gaining stored information that may assist in the tasks at hand.
  • Reasoning or the ability to create causal connections between input and output. These are often taken care of at the level of the LLM.
  • Planning to enable more complicated goals to be broken down into individually accomplishable tasks. May use external tools like memory to keep track of tasks.
  • Deciding and prioritizing to select between different options or available components
  • Summarizing and Abstracting to compress information into reusable chunks or otherwise abstract information to be more effective.
  • Logging + Remembering: Learning being the automatic or initiated information storage and recall that is accessed in memory
  • Reflection, or an internal (or external) evaluation of output, be it thoughts, planning, and thoughts.
  • Tool use While overlapping directly with Observing or taking memory actions, tool usage may be part of cognitive patterns (like using a scratch-pad) and must be considered as such.


Models provide the computational core of Agents. Acting like a 'brain' that takes in input prompts, they return outputs. Generally, the models may be considered frozen for a given agent, but sometimes, agentic feedback is used to help model creation with recurrent training.

Cognitive Architectures

Cognitive Architectures for Language Agents is a thoughtful understanding of Cognitive Architectures

They reveal a number of thoughtful perspectives on how to consider agents, considering much of what we have included here. Going further, image

image Relations between different systems. image

Prompt engineering as control flow image

Cognitive Topologies

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts provide excellent ways of thinking about reasoning.

The authors present topologies of reasoning as ways of thinking about reasoning using LLMs, or 'thoughts' that are called nodes and edges are dependencies between the thoughts are edges. If one thought is reachable from a task statement, that is a solution node, and the route is the solution topology.

They share thorough discussions on the following methods.

  1. Basic Input-Output (IO)
  2. Chain-of-Thought (CoT)
  3. Multiple CoTs (CoT-SC)
  4. Tree of Thoughts (ToT)
  5. Graph of Thoughts (GoT)

They consider common concepts such as:

  1. Multistep reasoning
  2. Zero-Shot Reasoning
  3. Planning and & Task Decomposition
  4. Task Preprocessing
  5. Iterative Refinement
  6. Tool Utilizatoin


They also summarize the general flow of a prompting interaction.

  1. The user sends their prompt
  2. Preprocessing
  3. Adding to into a prompting context
  4. Input the content to the LLM
  5. LLM Generation
  6. Post-processing (Checking NSFW)
  7. Returning information into the context, and either
  8. Iterating before returning to the user
  9. Reply to the user


They then share some important concepts related to topology.


They finally discuss Research opportunities:

  1. Exploring New Topology Classes
  2. Explicit Representation in Single-prompt Settings
  3. Automatically Deriving Tree and Graph Topologies
  4. Advancing Single-Prompt Schemes
  5. Investigating New Schedule Approaches
  6. Investigating Novel Graph Classes
  7. Integrating Graph Algorithms and Paradigms
  8. Diversifying Modalities in Prompting (multimodal)
  9. Enhancing Retrieval in Prompting
  10. Parallel Design in Prompting
  11. Integrating Structure-Enhanced Prompting with Graph Neural Networks
  12. Integrating Structure-Enhanced Prompting with Complex Architectures
  13. Hardware acceleration

Important Architectures

Thought systems are chain patterns used by single agents and systems to enable more robust responses. They can be executed programmatically given frameworks or sometimes done manually in a chat setting.

Here are some known thought structures that are improving agentic output.


Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Neurips paper"

A classic paper, demonstrating the use of in-call task breakdown to better-enable more successful outputs. Often represented as appending a phrase such as let's think about this step by step both with and without exemplars to improve success quality going from zero to multi-shot prompts. image

GitHub Repo stars ReAct

Effectively Observe, Think, Act, Repeat. Paper

Self-Refine: Iterative Refinement with Self-Feedback

The authors reveal in their paper that LLMs can generate feedback on their work, to repeatedly improve the output. image

GitHub Repo stars Reflexion: an autonomous agent with dynamic memory and self-reflection an agent with dynamic memory and self-reflection capabilities

image - Paper - Another Inspired github

Thread of Thought Unraveling Chaotic Contexts helps to summarize and deal with 'chaotic contexts' (tangents)


The Impact of Reasoning Step Length on Large Language Models -- Appending "you must think more steps

Appending "you must think more steps" to "Let’s think step by step" increases the reasoning steps and signficantly improves the accuracy on various reasoning tasks.

"Think About The Word: This strategy is to ask the model to interpret the word and rebuild the
knowledge base. Typically a word has multiple different meanings, and the effect of this is to get
the model to think outside the box and reinterpret the words in the problem based on the generated
interpretations. This process does not introduce new information. In the prompt, we give examples
of the words that the model is thinking about, and the model automatically picks words for this
process based on the new question.
• Read the question again: Read the questions repeatedly to reduce the interference of other texts
on the chain of thought. In short, we let the model remember the questions.
• Repeat State: Similar to repeated readings, we include a small summary of the current state after a
long chain of reasoning, aiming to help the model simplify its memory and reduce the interference
of other texts in the CoT.
• Self-Verification: Humans will check if their answers are correct when answering questions.
Therefore, before the model gets the answer, we add a self-verification process to judge whether
the answer is reasonable based on some basic information.
• Make Equation: For mathematical problems, Make Equations can help humans summarize and
simplify memory. And for some problems that require the assumption of an unknown number x,
establishing an equation is an essential process. We simulated this process and let the model try to
make equations in mathematical problems

In their prompts they have the following:
**Think About The World:**
Q: Could someone in Tokyo take a taxi to the Metropolitan Museum of Art?
A: Let’s think step by step. The stem of the sentence is Tokyo, take a taxi, Metropolitan Museum
of Art. Think about Tokyo... Think about taking a taxi... Think about the Metropolitan Museum of
Art... Inference: Tokyo is in Japan and the Metropolitan Museum of Art is in New York. The two
places are separated by the sea, so you can’t take a taxi there. Since the two places are separated
by the sea, you can’t take a taxi there. The answer is yes.
Q: {question}

**Read the question again**
Q: Mark’s father gave him $85. Mark bought 10 books, each of which cost $5. How much money
does Mark have left?
A: Let’s think step by step. The question is: How much money does Mark have left? So we need
to calculate How much money does Mark have left. Start looking for information about money
now. Mark’s father gave him $85. Mark bought 10 books, each of which cost $5. That means that
Mark spent $50 on books. So we have equation money = +85 - 50 = 35. So Mark has $85 - $50 =
$35 left. So the answer is 35.
Q: {question}
**Repeat State**
Q: A coin is heads up. Janette does not flip the coin. Stacey flips the coin. Ronny flips the coin.
Kim does not flip the coin. Is the coin still heads up? Note that "flip" here means "reverse".
A: Let’s think step by step. The state of the coin from the beginning is heads up. Janette does not
flip the coin, so the coin remains heads up. coin is heads up. Stacey flips the coin, so the coin is
now tails up. coin is now tail up. Ronny flips the coin, so the coin is now heads up again. Kim
does not flip the coin, so the coin remains heads up. coin is head up. The answer is yes.
Q: {question}

Q: Take the last letters of each words in "Alina Alessandra Amina Bianca" and concatenate them.
A: Let’s think step by step. There is four words. So the answer will consist of four letters. Explain
concatenate: concatenate is a term used in computer programming and mathematics, referring to
the operation of combining two or more strings, sequences, or sets in a linear order. The last letter
of "Alina" is "a". The last letter of "Alessandra" is "a". The last letter of "Amina" is "a". The last
letter of "Bianca" is "a". So we have four letters. So the final answer is "aaaa". The answer is aaaa.
Q: {question}
**Make Equation**
Q: 5 children were riding on the bus. At the bus stop 63 children got off the bus while some more
got on the bus. Then there were 14 children altogether on the bus. How many more children got
on the bus than those that got off?
A: Let’s think step by step. first step, 5 children were riding on the bus. We know 5 children is on
the bus. second step,There were 63 children that got off the bus. third step, some more got on the
bus we define as unknown x. fourth step, 14 children remained on the bus, which means we can
calculate unknow x.we have equation x+5-63 = 14, now we know x is 72. fifth step, Therefore, 72
- 63 = 9. 9 more children got on the bus than those that got off. The answer is 9.
Q: {question}
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Site A powerful solution to reasoning-based problems. It generates code-based solutions that can be executed or pseudo-executed with llm-enabled execution emulation (if code interpreter execution fails).
image image

System 2 Attention (is something you might need too)

This helps to improve downstream model's ability to not suffer from irrelevent context, or judgement and preference in the original context, termed sycophancy they use an initial model to remove unecessary context. They call it 'System 2 Attention'. Starting with instruction-tuned models that are 'proficient at reasoning and generation'.

They compare this to models that just use prompts like below to remove context in different manners:

    Given the following text by a user, extract the part that is unbiased and not their opinion,
    so that using that text alone would be good context for providing an unbiased answer to
    the question portion of the text.
    Please include the actual question or query that the user is asking. Separate this
    into two categories labeled with “Unbiased text context (includes all content except user’s
    bias):” and “Question/Query (does not include user bias/preference):”.
With several evaluations, including one for sycophancy, and a few variations, they show it can improve output even beyon Chain of Thought.

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models provides a solid improvement over scientific Q&A by first extracting fundamental principles in an initial multi-shotted prompt and then putting it into a subsequent multi-shotted prompt.

The authors find significant improvement over other methods. image


Here is the prompt they use to extract the first principles:

```markdown "MMLU Physics/Chemistry First-Principle Prompt" You are an expert at Physics/Chemistry. You are given a Physics/Chemistry problem. Your task is to extract the Physics/Chemistry concepts and principles involved in solving the problem. Here are a few examples: Question: Principles Involved: ... Question: Principles Involved: Question: Principles Involved:

Here is the prompt they use to use the extracted first principles and generate a final answer:

```markdown "MMLU Physics/Chemistry Final Answer Prompt"
You are an expert at Physics/Chemistry. You are given a
Physics/Chemistry problem and a set of principles involved in
solving the problem. Solve the problem step by step by following the
principles. Here are a few examples:
Question: <Question Example1>
Principles: <Principles Example1>
Answer: <Answer Example1>
Question: <Question Example5>
Principles: <Principles Example5>
Answer: <Answer Example5>
Question: <Question>
Principles: <Principles>

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Superseded by Chain of Code. Generates code to answer financial, and math-related problems. image

Including Memory

There are other memory based solutions including RAG that improve results. Here we reveal a few important ones.

Show your work: Scratch Pads for Intermediate Computation with Language Models

Demonstrates the use of 'scratch pads' to store intermediate results that can be recalled later for improved perfomance.

Planning and Reflective

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
     from helpers import extract_code
         def improve_algorithm(initial_solution, utility, language_model):
    """Improves a solution according to a utility function."""
    expertise = "You are an expert computer science researcher and programmer, especially skilled at
    , optimizing algorithms."
    message = f"""Improve the following solution:
        You will be evaluated based on this score function:
        You must return an improved solution. Be as creative as you can under the constraints.
    Your primary improvement must be novel and non-trivial. First, propose an idea, then implement it."""
    n_messages = min(language_model.max_responses_per_call, utility.budget)
    new_solutions = language_model.batch_prompt(expertise, [message] * n_messages, temperature=0.7)
    new_solutions = extract_code(new_solutions)
    best_solution = max(new_solutions, key=utility)
    return best_solution
    <img width="649" alt="image" src="">
    <img width="590" alt="image" src="">
    <img width="537" alt="image" src="">
[Chain-of-Verification Reduces Hallucination in Large Language Models]

Wherein they use the following Chain of Verification (CoVe) pattern to reduce

  1. Draft and initial response.
  2. Plan verification questions to fact-check the draft.
  3. Answers those questions independently to ensure it is unbiased by other responses.
  4. Generates the final verified response.


AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect and Learn

Uses a reasoning path that involves coved interleaved with LLM output, with something called Plan, Execute, Inspect, and Learn.

  1. Inspector: Injests, and summarizeds data for the Agent.
  2. Planner: Takes in instruction prompts, Input Query and Summaries of inputs coming from inpector. It outputs a thought about what will be done next and an action that follows a template of instruction-code. It uses multimodal assistance tools called a descriptor, locator and reasoner.
  3. Executor: takes code from Planner as input and then calls a module to produce output. There are some additional steps including Validation Checks Module Executions and Post-processsing
  4. Learner: This will be doing a self-assesment* or a **ground-trugh comparison to see if it is needing updates. It will keep trying until feedback is obeyed or N commands such as no adjustment needed, revise plan or update functions would be needed to improve it's flow.

AssistGPT empty github Webpage Uses PEIL PLan execute inspect learn.

Learning to Reason and Memorize with Self-Notes Allows model to deviate from input context at any time to reason and take notes


GitHub Repo stars BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

Paper Abstract: The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial for designing scientific experiments. Moreover, evaluation of the accuracy of scientific protocols is challenging, because experiments can be described correctly in many different ways, require expert knowledge to evaluate, and cannot usually be executed automatically. Here we present an automatic evaluation framework for the task of planning experimental protocols, and we introduce BioProt: a dataset of biology protocols with corresponding pseudocode representations. To measure performance on generating scientific protocols, we use an LLM to convert a natural language protocol into pseudocode, and then evaluate an LLM's ability to reconstruct the pseudocode from a high-level description and a list of admissible pseudocode functions. We evaluate GPT-3 and GPT-4 on this task and explore their robustness. We externally validate the utility of pseudocode representations of text by generating accurate novel protocols using retrieved pseudocode, and we run a generated protocol successfully in our biological laboratory. Our framework is extensible to the evaluation and improvement of language model planning abilities in other areas of science or other areas that lack automatic evaluation.


General manners of search. image

GitHub Repo stars LLMCompiler: An LLM Compiler for Parallel Function Calling provides an useful framework that improves latency, accuracy, and costs by orchestrating parallel calls.

Paper This breaks components down into a task-fetching unit and an executor to dynamically identify the tasks that could be executed, performs argument replacements on intermediate results, and an executor that performs function calls provided by the Task-fetching unit. image image

Toolchain*: Efficient Action Space Navigation in Large Language Models with A* Search provides an efficient tree guided-search algorithm that allows SOT performance

As opposed to other branching methods that allow for efficient exploration of action space, helping to find global optimization of a series of LLM calls. It happens in 3 general steps:

  • Selection from the highest quality frontier nodes \(\F(\Tau)\) of tree \(\Tau\), by choosing the node $n_next = arg min_{n\elem \F(\Tau)} f(n), given a cost-function oracle f(n) that provides the cost of the best plan of incorporating the \(n\)-th call into the chain.
  • Expansion to create the fronteir nodes of up to k-potential actions for the next step can be sampled.
  • Updating the frontier nodes to repeat the process.

The choice of the cost function is based on the \(A^*\) algorithm, where \(f(n) = g(n) + h(n)\) where \(g(n)\) is the cost of the path from the start node, and \(h(n)\) is a heuristic function that estimates the cheapest path from \(n\) to the destination goal.

Their choice of \(g(n)\) is generally the sum of single-step costs from ancestor nodes. More accurately they create a geometric sum of two different step value functions.

One step function is a task-specific heuristic function that maximizes the longest-common subsequence score over other paths. The longest-common subsequence score finds the longest-common subsequence between plan \(s_n\) and other plans \(m_j\) and divides by the smaller lengths of the paths \(s_n\) and \(m_j\).

The other step function is a self-consistency frequency that takes an ensemble approach to generate the next steps. It calculates the number of actions that arrive at step n using non-semantically equivalent reasoning steps, divided by the number of k samples.

Their choice of the future cost \(h(n)\) is a multiplicative combination of a similar task-specific heuristic and an imagination score, enabled by an LLM.

The future task-specific heuristic calculates the average fractional position of action found within all plans.

The imagination score directly queries the LLMs to imagine more concrete steps until target node \(n_T\) and computing the ratio of the number of steps of the number between the current node n ancestors to the target node. The higher score 'suggests the imagined plan closely captures the path to the current step, indicating that fewer remaining steps are needed to accomplish the task in the imagination of LLMs.

image image image

Algorithm of Thoughts A general extension of Chain of Thought, similar to Graph of Thoughts


Graph of Thoughts Generalizes Chain of Thought, Tree of Thoughts, and similar systems of thought


Graph of Thought

An excellent thought on what to consider next when dealing with knowledge (or other output like information) generation chains. image

GitHub Repo stars Meta Tree of thought


Strategic Reasoning with Language Models Uses game trees and observed and inferred beliefs to achieve closer to optimal results.

Powerful to consider for inferred beliefs and interacting in situations where negotiation or games are being played. image

Large Language Model Guided Tree-of-Thought


Tree of Thoughts: Deliberate Problem Solving with Large Language Models A method that allows for idea-expansion and selection of the final result output by choosing the best at each stage.

The thought flow image Github

"Prompts compared"

    standard_prompt = '''
    Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be: {input}
    cot_prompt = '''
    Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be: {input}

    Make a plan then write. Your output should be of the following format:

    Your plan here.

    Your passage here.

    vote_prompt = '''Given an instruction and several choices, decide which choice is most promising. Analyze each choice in detail, then conclude in the last line "The best choice is {s}", where s the integer id of the choice.

    compare_prompt = '''Briefly analyze the coherency of the following two passages. Conclude in the last line "The more coherent passage is 1", "The more coherent passage is 2", or "The two passages are similarly coherent".

    score_prompt = '''Analyze the following passage, then at the last line conclude "Thus the coherency score is {s}", where s is an integer from 1 to 10.


Teaching Large Language Models to Self-Debug transcoder

Coding focused LLM system to continuously improve self. image

Language Models can Solve Computer Tasks Uses Recursive Criticism and Improvement.

Website, GitHub Combining with Chain of Thought it is even better. The method: Plan: Critique, Improve - Explicit RCI: "Review your previous answer and find problems with your answer." → "Based on the problems you found, improve your answer." Recursively Criticizes and Improves its output. This sort of prompting outperforms Chain of Thought, and combined it works even better.

Structural and Task Decomposition

Breaking down the input into a divide-and-conquer approach is a valuable approach to more complex requests. Considering separate perspectives, within the same model, or within separate model calls with different prompt-inceptions as in agent systems can improve performance.

ProTIP: Progressive Tool Retrieval Improves Planning

The authors demonstrate a dynamic contrastive learning-based framework implicitly performs task decomposition without explicit subtask requirements, while retaining subtask automicity. image

Skeleton of Thought

A nice structure that resembles the thoughtful creation of answers allows for parallelization and hence speedup, with comparable or better results in answer generation. image

Skeleton prompt template
    [User:] You’re an organizer responsible for only giving the skeleton (not the full content) for answering the question.
    Provide the skeleton in a list of points (numbered 1., 2., 3., etc.) to answer the question. Instead of writing a full
    sentence, each skeleton point should be very short with only 3∼5 words. Generally, the skeleton should have 3∼10
    What are the typical types of Chinese dishes?
    1. Dumplings.
    2. Noodles.
    3. Dim Sum.
    4. Hot Pot.
    5. Wonton.
    6. Ma Po Tofu.
    7. Char Siu.
    8. Fried Rice.
    What are some practical tips for individuals to reduce their carbon emissions?
    1. Energy conservation.
    2. Efficient transportation.
    3. Home energy efficiency.
    4. Reduce water consumption.
    5. Sustainable diet.
    6. Sustainable travel.
    Now, please provide the skeleton for the following question.
    [Assistant:] 1.
Point expanding prompt template
    [User:] You’re responsible for continuing the writing of one and only one point in the overall answer to the following
    The skeleton of the answer is
    Continue and only continue the writing of point {point index}. Write it **very shortly** in 1∼2 sentence and
    do not continue with other points!
    [Assistant:] {point index}. {point skeleton}

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

image A nice discussion on it

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent Through Multi-person Self-Collaboration

Uses a prompt that initiates a group of personas to be used within the same LLM call to facilitate collaborative analysis and creation of the final output. Solid improvement but comparisons to other techniques are potentially uncertain. "Example prompt"

```python title="Trivia writing SPP'

spp_prompt = '''When faced with a task, begin by identifying the participants who will contribute to solving the task. Then, initiate a multi-round collaboration process until a final solution is reached. The participants will give critical comments and detailed suggestions whenever necessary.

Here are some examples:
Example Task 1: Use numbers and basic arithmetic operations (+ - * /) to obtain 24. You need to use all numbers, and each number can only be used once.
Input: 6 12 1 1

Participants: AI Assistant (you); Math Expert

Start collaboration!

Math Expert: Let's analyze the task in detail. You need to make sure that you meet the requirement, that you need to use exactly the four numbers (6 12 1 1) to construct 24. To reach 24, you can think of the common divisors of 24 such as 4, 6, 8, 3 and try to construct these first. Also you need to think of potential additions that can reach 24, such as 12 + 12.
AI Assistant (you): Thanks for the hints! Here's one initial solution: (12 / (1 + 1)) * 6 = 24
Math Expert: Let's check the answer step by step. (1+1) = 2, (12 / 2) = 6, 6 * 6 = 36 which is not 24! The answer is not correct. Can you fix this by considering other combinations? Please do not make similar mistakes.
AI Assistant (you): Thanks for pointing out the mistake. Here is a revised solution considering 24 can also be reached by 3 * 8: (6 + 1 + 1) * (12 / 4) = 24.
Math Expert: Let's first check if the calculation is correct. (6 + 1 + 1) = 8, 12 / 4 = 3, 8 * 3 = 24. The calculation is correct, but you used 6 1 1 12 4 which is not the same as the input 6 12 1 1. Can you avoid using a number that is not part of the input?
AI Assistant (you): You are right, here is a revised solution considering 24 can be reached by 12 + 12 and without using any additional numbers: 6 * (1 - 1) + 12 = 24.
Math Expert: Let's check the answer again. 1 - 1 = 0, 6 * 0 = 0, 0 + 12 = 12. I believe you are very close, here is a hint: try to change the "1 - 1" to "1 + 1".
AI Assistant (you): Sure, here is the corrected answer:  6 * (1+1) + 12 = 24
Math Expert: Let's verify the solution. 1 + 1 = 2, 6 * 2 = 12, 12 + 12 = 12. You used 1 1 6 12 which is identical to the input 6 12 1 1. Everything looks good!

Finish collaboration!

Final answer: 6 * (1 + 1) + 12 = 24




Teach LLMs to Personalize – An Approach inspired by Writing Education


Constraining outputs

Certified Reasoning with Language models A 'logical guide' tool that an LLM can use.

It " uses constrained decoding to ensure the model will incrementally generate one of the valid outputs." image Possible open-source implementation here

GitHub Repo stars Outlines guides the model generation of next-token logits to guide the generation corresponding to regex / JSON and pydantic schema. compatible with all models.

Also provides a way to functionalize templates to separate prompt logic.

Automated chain discovery, selection, and creation.

GitHub Repo stars Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models

Paper This algorithm samples exemplars to construct demonstrations that enable improved accuracy of multi-shotted outcomes using the Chain-of-Thought prompting method. image

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
  • GPT4 + Simple Prompts (86.1, MedQA task) 
  • GPT4 + Complex Prompts (90.2, MedQA task)

The Authors use 'in context learning' (more like RAG) to identify prompting chains for specific problem sets that are 'winning'.

Their prompting strategies can efficiently steer GPT-4 to achieve top performance on medical problems (90% on MedQA dataset). 

The winning composition of prompting strategies is fairly elaborate including multiple steps:

  1. Preprocessing Phase:

- Iterate through each question in the training dataset.  - Generate an embedding vector for each question using a lightweight embedding model, such as OpenAI's text-embedding-ada-002.  - Use GPT-4 to generate a chain of thought and a prediction of the final answer.  - Compare the GPT-4 generated answer against the ground truth (correct answer).  - Store questions, their embedding vectors, chains of thought, and answers if the prediction is correct; otherwise, discard them.

  1. Inference Step:

- Compute the embedding for the test question using the same embedding model as in preprocessing.  - Select the most similar examples from the preprocessed training data using k-Nearest Neighbors (kNN) and cosine similarity as the distance function.  - Format the selected examples as context for GPT-4.  - Repeat the following steps several times (e.g., five times as configured):  - Shuffle the answer choices for the test question.  - Prompt GPT-4 with the context and shuffled test question to generate a chain of thought and a candidate answer.  - Determine the final predicted answer by taking a majority vote of the generated candidate answers.

Additional Details:

  • The strategy uses 5 kNN-selected few-shot exemplars and performs 5 parallel API calls in the ensemble procedure.
  • Ablation studies suggest that increasing the number of few-shot exemplars and ensemble items can yield better performance.
  • The general methodology of combining few-shot exemplar selection, self-generated chain-of-thought reasoning, and majority vote ensembling is not limited to medical texts and can be adapted to other domains and problem types.


  • Assumes availability of training ground truth data needed for preprocessing steps
  • Costs (multiple llm inference calls, latency). This will matter depending on use case and accuracy requirements 
  • Problem Domain - this will work best for tasks that have a single valid objective answer

image image

Chain Optimization

Problems such as Hallucinations can be mitigated through downstream methods of process.

A stitch in time saves Nine

A process to mitigate model hallucination using RAG. image