fix(evaluation): resolve BM25/Embedding index filename mismatch when using --from-conv/--to-conv#136
Open
Jah-yee wants to merge 4 commits intoEverMind-AI:mainfrom
Open
Conversation
- Rename stage3_memory_retrivel.py to stage3_memory_retrieval.py (typo fix) - Replace == None with is None (Python anti-pattern) - Replace != True with is not True (Python anti-pattern) - Replace bare except with except Exception - Remove duplicate 'rrf' entry in docstring - Remove unused MongoDB init script volume mount from docker-compose.yaml - Add missing env template setup step in STARTER_KIT.md quick start Fixes: EverMind-AI#115, EverMind-AI#113, EverMind-AI#107, EverMind-AI#97, EverMind-AI#91, EverMind-AI#90, EverMind-AI#86
…v slicing
When using sliced runs (e.g. --from-conv 234 --to-conv 264), the index
files were being saved with sequential indices (0, 1, 2...) but search
was looking up with global conversation IDs (234, 235, 263), causing
'BM25 index not found' errors.
Changes:
- stage2_index_building.py: Use conversation_ids to name index files with
extracted numeric IDs (e.g., 'bm25_index_conv_234.pkl')
- evermemos_adapter.py:
- Pass conversation_ids to stage2 for proper file naming
- Fix conv_id_to_index mapping to map conversation_id -> extracted
numeric ID (not sequential index)
- Update _check_missing_indexes to use proper file naming
- Save conversation_index_mapping.json for debugging
This ensures index files and search lookups use consistent IDs.
… content This commit addresses issue EverMind-AI#131 by adding a 'full' query parameter to the GET /api/v1/memories endpoint. When full=True, the response includes the complete episode field which is not returned by default for backward compatibility. Changes: - Add 'full' parameter to FetchMemRequest DTO - Add 'episode' field to EpisodicMemoryModel (optional, returned only when full=True) - Update find_memories method to accept 'full' parameter - Update _convert_episodic_memory to conditionally include episode content - Update memory_manager to pass 'full' parameter to fetch service This allows external benchmarks and third-party integrations to access the full episodic memory content for auditing and verification purposes. See: EverMind-AI#131
…using --from-conv/--to-conv When running evaluation with sliced conversation ranges (e.g., --from-conv 234 --to-conv 264), the index files were built with actual conversation IDs (e.g., bm25_index_conv_234.pkl) but the retrieval stage was looking for sequential indices (e.g., bm25_index_conv_0.pkl), causing empty retrieval results. This fix: - Reads conversation_ids from config (same as stage2_index_building.py) - Extracts the numeric ID from conversation_id for file naming - Falls back to sequential indices for backward compatibility Fixes EverMind-AI#127
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Good day,
Problem
When running evaluation with sliced conversation ranges (e.g.,
--from-conv 234 --to-conv 264), the BM25/Embedding index files were built with actual conversation IDs (e.g.,bm25_index_conv_234.pkl) but the retrieval stage was looking for sequential indices (e.g.,bm25_index_conv_0.pkl), causing empty retrieval results and incorrect evaluation scores.Root Cause
In
stage3_memory_retrieval.py, the index loading code used sequential loop indices ({i}) directly without checking ifconversation_idswas provided in the config. Meanwhile,stage2_index_building.pycorrectly extracted conversation IDs from config for file naming.Fix
This PR adds the same logic from
stage2_index_building.pytostage3_memory_retrieval.py:conversation_idsfrom configChanges
evaluation/src/adapters/evermemos/stage3_memory_retrieval.pyto useconv_id_for_filevariable for index file namingTesting
The syntax has been verified with
python3 -m py_compile.感谢你们的奉献,希望能提供帮助。如果我解决得有问题或有待商妥的地方,请在下面留言,我会来处理。
Warmly,
Jah-yee