Memory

Just like people, memory plays a crucial role in enhancing the efficiency of information generation. Memory can either be global or external to the existence of an agent or an agent-network, or internal to the network, and gained by experiences it gained during from the agent or agent-network's efforts. Each of these types of memories are useful to extract information that is then placed into the LLM's prompt-context and allowing a more accurate generation of information.

Here we discuss experiential memory based on the activity or action action of one or many agents.

Recipients of LLM chat-interfaces with multiple sessions may benefit from stored experiential memory. Guarded by any default or manual firewalls, experiential memory may allow focused and enduring memory tracks that have more specific focuses. For instance, when a recipient is has used time to create something from scratch in a most effective manner, when that 'effective manner' needs to be understood to minimize the time necessary to do the same thing, or something similar. This is not unlikely why OpenAI enabled memory for their agents. The way this memory is managed, and accessed is of prime importance to retention and experiential transfer, the sharing of experiences between different Agents without having to 'repeat' information.

Experiential Memory¶

Types of memory include simple aspects such as conversation buffers to keep track of what has been said be employed to keep track of information. These buffers, can be 'private', can facilitate communication between any agents, storing response stacks that include agent-environment interactions.

For text-based memory can consist of verbatim text record or some form compressed summary to reduce memory overhead. The memory may be stored in simple file-based formats, or more complexe databases, both eith or without some form of schema that allow for generally structured representation.

Here are some general types of memory:

Conversaton Buffers
Scratch-pads
Gists and Summarizaton
Action-success lookups

For example Open AI has launched memory for chatGPT, that stores relevant memory in a manner that allows the user control of what can be stored. It does not, yet, allow for memory compartmentalization of memories into groups that could help to focus relevance to generated content.

Storage and Retrieval Methods¶

Memory can be retrieved via look up methodes that involve data-base queries (SQL, Graph), though they can also use vector lookups. They can also be stored in simple ascii documents and search for via key-word lookups.

Traditional databases¶

Databases that rely on query-languages such as SQL or non-SQL based databases, or even 'csv-type' information stores can be accessed and generated using agents.

Graph Databases¶

Graph Databases provide the ability to put information in relational contexts. Both native and not, they can allow for rich understandings of how things are connected, though sometimes overly complex. Often interacted with using query languages like Cypher, these can be sometimes challenging to extract the appropriate information, making their query very powerful.

Neo4j has formed a semantic layer, as shown in the tomasonjo/llm-movieagent repository.

y by an interpreter, though it is not guaranteed that the queries will be accurate. [TODO: Find reference some_reference_on_LLM_SQL]

References

For more information on memory implementations and caching, refer to the following resources: - Langchain memory - Langchain llm_caching

Vector databases¶

Vector databases, such as Pinecone, Qdrant, Weaviate, Chroma, Faiss, Redis, Milvus, and ScaNN, use embeddings to create query vector databases. These databases allow for efficient semantic searches.

Improving language models by retrieving from trillions of tokens

Example vector databases

Please read this for more information Vector Databases (primer by Pinecone.io)

VectorHub: Evaluation of multiple Vector databases

"Vector Hub is a free and open-sourced learning hub for people interested in adding vector retrieval to their ML stack. On VectorHub you will find practical resources to help you" VDB comparisons

Tech stack solutions¶

Mem0: provides memory for agents in an ice an easy manner

Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between entities over time.

Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches. GetZep: self-improving memory users, sessions and more"

Text¶

📋

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Developments The authors reveal an effective manner of providing effective summaries of long books using two methods: 1. Hierarchichal merging of chunk-level summaries, and 2. Incremental update using a running summary. Results Human evaluation shows that "hierarchical merging produces more coherent summaries but may lack detail compared to incremental updating; closedsource models like GPT-4 and Claude 2 generate the most coherent summaries; and increasing chunk size can significantly improve incremental updating" Paper

📋

Read-agent: A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Jupyter notebook Developments

The authors reveal a manner of reading long documents and summarizing it using Gist memory to deal with Long Contexts.

Problem

Context length of long inputs limits the ability for model to perform effectively and efficienntly.

Solution

With inspiration in how people interactively read long documents, the authors implement a simple prompting-based system that

Decides what content should be stored togeter in a memory episode
Compresses those memories into short episodic memories called gist memories and
Takes actions to look up sections in the original text if memory needs to be refreshed

Results The simple method improves reading comperhension tasks at the same time as enabling context windows that are 3-20x bigger.

Paper