my-project-public

repository

loading code, commits, and activity

repositories

loading repo index

#1	---
#2	title: Sentence Transformer
#3	description: 'Local reranking with HuggingFace cross-encoder models'
#4	---
#5
#6	Sentence Transformer reranker provides local reranking using HuggingFace cross-encoder models, perfect for privacy-focused deployments where you want to keep data on-premises.
#7
#8	## Models
#9
#10	Any HuggingFace cross-encoder model can be used. Popular choices include:
#11
#12	- `cross-encoder/ms-marco-MiniLM-L-6-v2`: Default, good balance of speed and accuracy
#13	- `cross-encoder/ms-marco-TinyBERT-L-2-v2`: Fastest, smaller model size
#14	- `cross-encoder/ms-marco-electra-base`: Higher accuracy, larger model
#15	- `cross-encoder/stsb-distilroberta-base`: Good for semantic similarity tasks
#16
#17	## Installation
#18
#19	```bash
#20	pip install sentence-transformers
#21	```
#22
#23	## Configuration
#24
#25	```python Python
#26	from mem0 import Memory
#27
#28	config = {
#29	"vector_store": {
#30	"provider": "chroma",
#31	"config": {
#32	"collection_name": "my_memories",
#33	"path": "./chroma_db"
#34	}
#35	},
#36	"llm": {
#37	"provider": "openai",
#38	"config": {
#39	"model": "gpt-4o-mini"
#40	}
#41	},
#42	"rerank": {
#43	"provider": "sentence_transformer",
#44	"config": {
#45	"model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
#46	"device": "cpu", # or "cuda" for GPU
#47	"batch_size": 32,
#48	"show_progress_bar": False,
#49	"top_k": 5
#50	}
#51	}
#52	}
#53
#54	memory = Memory.from_config(config)
#55	```
#56
#57	## GPU Acceleration
#58
#59	For better performance, use GPU acceleration:
#60
#61	```python Python
#62	config = {
#63	"rerank": {
#64	"provider": "sentence_transformer",
#65	"config": {
#66	"model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
#67	"device": "cuda", # Use GPU
#68	"batch_size": 64 # high batch size for high memory GPUs
#69	}
#70	}
#71	}
#72	```
#73
#74	## Usage Example
#75
#76	```python Python
#77	from mem0 import Memory
#78
#79	# Initialize memory with local reranker
#80	config = {
#81	"vector_store": {"provider": "chroma"},
#82	"llm": {"provider": "openai", "config": {"model": "gpt-4o-mini"}},
#83	"rerank": {
#84	"provider": "sentence_transformer",
#85	"config": {
#86	"model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
#87	"device": "cpu"
#88	}
#89	}
#90	}
#91
#92	memory = Memory.from_config(config)
#93
#94	# Add memories
#95	messages = [
#96	{"role": "user", "content": "I love reading science fiction novels"},
#97	{"role": "user", "content": "My favorite author is Isaac Asimov"},
#98	{"role": "user", "content": "I also enjoy watching sci-fi movies"}
#99	]
#100
#101	memory.add(messages, user_id="charlie")
#102
#103	# Search with local reranking
#104	results = memory.search("What books does the user like?", user_id="charlie")
#105
#106	for result in results['results']:
#107	print(f"Memory: {result['memory']}")
#108	print(f"Vector Score: {result['score']:.3f}")
#109	print(f"Rerank Score: {result['rerank_score']:.3f}")
#110	print()
#111	```
#112
#113	## Custom Models
#114
#115	You can use any HuggingFace cross-encoder model:
#116
#117	```python Python
#118	# Using a different model
#119	config = {
#120	"rerank": {
#121	"provider": "sentence_transformer",
#122	"config": {
#123	"model": "cross-encoder/stsb-distilroberta-base",
#124	"device": "cpu"
#125	}
#126	}
#127	}
#128	```
#129
#130	## Configuration Parameters
#131
#132	\| Parameter \| Description \| Type \| Default \|
#133	\|-----------\|-------------\|------\|---------\|
#134	\| `model` \| HuggingFace cross-encoder model name \| `str` \| `"cross-encoder/ms-marco-MiniLM-L-6-v2"` \|
#135	\| `device` \| Device to run model on (`cpu`, `cuda`, etc.) \| `str` \| `None` \|
#136	\| `batch_size` \| Batch size for processing documents \| `int` \| `32` \|
#137	\| `show_progress_bar` \| Show progress bar during processing \| `bool` \| `False` \|
#138	\| `top_k` \| Maximum documents to return \| `int` \| `None` \|
#139
#140	## Advantages
#141
#142	- Privacy: Complete local processing, no external API calls
#143	- Cost: No per-token charges after initial model download
#144	- Customization: Use any HuggingFace cross-encoder model
#145	- Offline: Works without internet connection after model download
#146
#147	## Performance Considerations
#148
#149	- First Run: Model download may take time initially
#150	- Memory Usage: Models require GPU/CPU memory
#151	- Batch Size: Optimize batch size based on available memory
#152	- Device: GPU acceleration significantly improves speed
#153
#154	## Best Practices
#155
#156	1. Model Selection: Choose model based on accuracy vs speed requirements
#157	2. Device Management: Use GPU when available for better performance
#158	3. Batch Processing: Process multiple documents together for efficiency
#159	4. Memory Monitoring: Monitor system memory usage with larger models

z6Mkq5mY3JWtxoxUobWcfNHm7AkRubgSWEZTkBVqZXJviFZ5/my-project-public