repositories
loading repo index
repositories
loading repo index
repository
loading code, commits, and activity
public Clawd ADK gateway launch mirror
stars
latest
clone command
git clone gitlawb://did:key:z6Mkq5mY...iFZ5/my-project-publ...git clone gitlawb://did:key:z6Mkq5mY.../my-project-publ...2fa351d6docs: add automaton and perps launch sources16d ago| #1 | # Mem0: Building Production‑Ready AI Agents with Scalable Long‑Term Memory |
| #2 | |
| #3 | [](https://arxiv.org/abs/2504.19413) |
| #4 | [](https://mem0.ai/research) |
| #5 | |
| #6 | This repository contains the code and dataset for our paper: **Mem0: Building Production‑Ready AI Agents with Scalable Long‑Term Memory**. |
| #7 | |
| #8 | ## 📋 Overview |
| #9 | |
| #10 | This project evaluates Mem0 and compares it with different memory and retrieval techniques for AI systems: |
| #11 | |
| #12 | 1. **Established LOCOMO Benchmarks**: We evaluate against five established approaches from the literature: LoCoMo, ReadAgent, MemoryBank, MemGPT, and A-Mem. |
| #13 | 2. **Open-Source Memory Solutions**: We test promising open-source memory architectures including LangMem, which provides flexible memory management capabilities. |
| #14 | 3. **RAG Systems**: We implement Retrieval-Augmented Generation with various configurations, testing different chunk sizes and retrieval counts to optimize performance. |
| #15 | 4. **Full-Context Processing**: We examine the effectiveness of passing the entire conversation history within the context window of the LLM as a baseline approach. |
| #16 | 5. **Proprietary Memory Systems**: We evaluate OpenAI's built-in memory feature available in their ChatGPT interface to compare against commercial solutions. |
| #17 | 6. **Third-Party Memory Providers**: We incorporate Zep, a specialized memory management platform designed for AI agents, to assess the performance of dedicated memory infrastructure. |
| #18 | |
| #19 | We test these techniques on the LOCOMO dataset, which contains conversational data with various question types to evaluate memory recall and understanding. |
| #20 | |
| #21 | ## 🔍 Dataset |
| #22 | |
| #23 | The LOCOMO dataset used in our experiments can be downloaded from our Google Drive repository: |
| #24 | |
| #25 | [Download LOCOMO Dataset](https://drive.google.com/drive/folders/1L-cTjTm0ohMsitsHg4dijSPJtqNflwX-?usp=drive_link) |
| #26 | |
| #27 | The dataset contains conversational data specifically designed to test memory recall and understanding across various question types and complexity levels. |
| #28 | |
| #29 | Place the dataset files in the `dataset/` directory: |
| #30 | - `locomo10.json`: Original dataset |
| #31 | - `locomo10_rag.json`: Dataset formatted for RAG experiments |
| #32 | |
| #33 | ## 📁 Project Structure |
| #34 | |
| #35 | ``` |
| #36 | . |
| #37 | ├── src/ # Source code for different memory techniques |
| #38 | │ ├── mem0/ # Implementation of the Mem0 technique |
| #39 | │ ├── openai/ # Implementation of the OpenAI memory |
| #40 | │ ├── zep/ # Implementation of the Zep memory |
| #41 | │ ├── rag.py # Implementation of the RAG technique |
| #42 | │ └── langmem.py # Implementation of the Language-based memory |
| #43 | ├── metrics/ # Code for evaluation metrics |
| #44 | ├── results/ # Results of experiments |
| #45 | ├── dataset/ # Dataset files |
| #46 | ├── evals.py # Evaluation script |
| #47 | ├── run_experiments.py # Script to run experiments |
| #48 | ├── generate_scores.py # Script to generate scores from results |
| #49 | └── prompts.py # Prompts used for the models |
| #50 | ``` |
| #51 | |
| #52 | ## 🚀 Getting Started |
| #53 | |
| #54 | ### Prerequisites |
| #55 | |
| #56 | Create a `.env` file with your API keys and configurations. The following keys are required: |
| #57 | |
| #58 | ``` |
| #59 | # OpenAI API key for GPT models and embeddings |
| #60 | OPENAI_API_KEY="your-openai-api-key" |
| #61 | |
| #62 | # Mem0 API keys (for Mem0 and Mem0+ techniques) |
| #63 | MEM0_API_KEY="your-mem0-api-key" |
| #64 | MEM0_PROJECT_ID="your-mem0-project-id" |
| #65 | MEM0_ORGANIZATION_ID="your-mem0-organization-id" |
| #66 | |
| #67 | # Model configuration |
| #68 | MODEL="gpt-4o-mini" # or your preferred model |
| #69 | EMBEDDING_MODEL="text-embedding-3-small" # or your preferred embedding model |
| #70 | ZEP_API_KEY="api-key-from-zep" |
| #71 | ``` |
| #72 | |
| #73 | ### Running Experiments |
| #74 | |
| #75 | You can run experiments using the provided Makefile commands: |
| #76 | |
| #77 | #### Memory Techniques |
| #78 | |
| #79 | ```bash |
| #80 | # Run Mem0 experiments |
| #81 | make run-mem0-add # Add memories using Mem0 |
| #82 | make run-mem0-search # Search memories using Mem0 |
| #83 | |
| #84 | # Run Mem0+ experiments (with graph-based search) |
| #85 | make run-mem0-plus-add # Add memories using Mem0+ |
| #86 | make run-mem0-plus-search # Search memories using Mem0+ |
| #87 | |
| #88 | # Run RAG experiments |
| #89 | make run-rag # Run RAG with chunk size 500 |
| #90 | make run-full-context # Run RAG with full context |
| #91 | |
| #92 | # Run LangMem experiments |
| #93 | make run-langmem # Run LangMem |
| #94 | |
| #95 | # Run Zep experiments |
| #96 | make run-zep-add # Add memories using Zep |
| #97 | make run-zep-search # Search memories using Zep |
| #98 | |
| #99 | # Run OpenAI experiments |
| #100 | make run-openai # Run OpenAI experiments |
| #101 | ``` |
| #102 | |
| #103 | Alternatively, you can run experiments directly with custom parameters: |
| #104 | |
| #105 | ```bash |
| #106 | python run_experiments.py --technique_type [mem0|rag|langmem] [additional parameters] |
| #107 | ``` |
| #108 | |
| #109 | #### Command-line Parameters: |
| #110 | |
| #111 | | Parameter | Description | Default | |
| #112 | |-----------|-------------|---------| |
| #113 | | `--technique_type` | Memory technique to use (mem0, rag, langmem) | mem0 | |
| #114 | | `--method` | Method to use (add, search) | add | |
| #115 | | `--chunk_size` | Chunk size for processing | 1000 | |
| #116 | | `--top_k` | Number of top memories to retrieve | 30 | |
| #117 | | `--filter_memories` | Whether to filter memories | False | |
| #118 | | `--is_graph` | Whether to use graph-based search | False | |
| #119 | | `--num_chunks` | Number of chunks to process for RAG | 1 | |
| #120 | |
| #121 | ### 📊 Evaluation |
| #122 | |
| #123 | To evaluate results, run: |
| #124 | |
| #125 | ```bash |
| #126 | python evals.py --input_file [path_to_results] --output_file [output_path] |
| #127 | ``` |
| #128 | |
| #129 | This script: |
| #130 | 1. Processes each question-answer pair |
| #131 | 2. Calculates BLEU and F1 scores automatically |
| #132 | 3. Uses an LLM judge to evaluate answer correctness |
| #133 | 4. Saves the combined results to the output file |
| #134 | |
| #135 | ### 📈 Generating Scores |
| #136 | |
| #137 | Generate final scores with: |
| #138 | |
| #139 | ```bash |
| #140 | python generate_scores.py |
| #141 | ``` |
| #142 | |
| #143 | This script: |
| #144 | 1. Loads the evaluation metrics data |
| #145 | 2. Calculates mean scores for each category (BLEU, F1, LLM) |
| #146 | 3. Reports the number of questions per category |
| #147 | 4. Calculates overall mean scores across all categories |
| #148 | |
| #149 | Example output: |
| #150 | ``` |
| #151 | Mean Scores Per Category: |
| #152 | bleu_score f1_score llm_score count |
| #153 | category |
| #154 | 1 0.xxxx 0.xxxx 0.xxxx xx |
| #155 | 2 0.xxxx 0.xxxx 0.xxxx xx |
| #156 | 3 0.xxxx 0.xxxx 0.xxxx xx |
| #157 | |
| #158 | Overall Mean Scores: |
| #159 | bleu_score 0.xxxx |
| #160 | f1_score 0.xxxx |
| #161 | llm_score 0.xxxx |
| #162 | ``` |
| #163 | |
| #164 | ## 📏 Evaluation Metrics |
| #165 | |
| #166 | We use several metrics to evaluate the performance of different memory techniques: |
| #167 | |
| #168 | 1. **BLEU Score**: Measures the similarity between the model's response and the ground truth |
| #169 | 2. **F1 Score**: Measures the harmonic mean of precision and recall |
| #170 | 3. **LLM Score**: A binary score (0 or 1) determined by an LLM judge evaluating the correctness of responses |
| #171 | 4. **Token Consumption**: Number of tokens required to generate final answer. |
| #172 | 5. **Latency**: Time required during search and to generate response. |
| #173 | |
| #174 | ## 📚 Citation |
| #175 | |
| #176 | If you use this code or dataset in your research, please cite our paper: |
| #177 | |
| #178 | ```bibtex |
| #179 | @article{mem0, |
| #180 | title={Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory}, |
| #181 | author={Chhikara, Prateek and Khant, Dev and Aryan, Saket and Singh, Taranjeet and Yadav, Deshraj}, |
| #182 | journal={arXiv preprint arXiv:2504.19413}, |
| #183 | year={2025} |
| #184 | } |
| #185 | ``` |
| #186 | |
| #187 | ## 📄 License |
| #188 | |
| #189 | [MIT License](LICENSE) |
| #190 | |
| #191 | ## 👥 Contributors |
| #192 | |
| #193 | - [Prateek Chhikara](https://github.com/prateekchhikara) |
| #194 | - [Dev Khant](https://github.com/Dev-Khant) |
| #195 | - [Saket Aryan](https://github.com/whysosaket) |
| #196 | - [Taranjeet Singh](https://github.com/taranjeet) |
| #197 | - [Deshraj Yadav](https://github.com/deshraj) |
| #198 | |
| #199 |