Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
e0c9108
spec-005: librarian sub-package + agent + prompt v1.0.0 + SS API key …
jeremymanning May 6, 2026
cb7cb6a
spec-005: US1 unit tests for librarian core (50 new tests, all pass) …
jeremymanning May 6, 2026
3cf225d
spec-005: US2 expansion + Search trail tests (24 new tests, all pass)…
jeremymanning May 6, 2026
f029dfc
spec-005: US4 cross-domain coverage (8 fields PASS) + induced failure…
jeremymanning May 7, 2026
d6abaa3
spec-005: rewire flesh_out + soft-deprecate citation_fetcher + citati…
jeremymanning May 7, 2026
c8ae4a8
spec-005: deliberate state edit — roll PROJ-261 back to flesh_out_in_…
jeremymanning May 7, 2026
7f47f02
spec-005: flesh_out re-run on PROJ-261 with librarian-backed lit sear…
jeremymanning May 7, 2026
d110c37
spec-005: US3 Phase 1 re-validation on PROJ-261/262 (Phase 7 complete…
jeremymanning May 7, 2026
602aa42
spec-005: diagnostic report (Phase 8 / US5, T049-T059, #107)
jeremymanning May 7, 2026
cc38ffa
spec-005: carry-forward manifest names canonicals for spec 006 (Phase…
jeremymanning May 7, 2026
02c8a70
spec-005: polish — lint clean + FR-022 enforcement test + spec In Rev…
jeremymanning May 7, 2026
5c267ca
spec-005: tick T068-T070 (push + PR + tracker comment) (#107)
jeremymanning May 7, 2026
260ddd2
spec-005 fix-up: P5-D08 — relevance gate in verify_citation (CRITICAL)
jeremymanning May 7, 2026
d582a0a
spec-005 fix-up #2: P5-D10 — LLM-based topical-relevance judge (CRITI…
jeremymanning May 7, 2026
2712d24
spec-005 fix-up #3: P5-D11 — concept-decomposed query extractor (CRIT…
jeremymanning May 8, 2026
cb5a5ba
spec-005 fix-up #4: P5-D12 — judge ACCEPT categories + extractor empi…
jeremymanning May 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"feature_directory": "specs/003-phase1-idea-lifecycle-testing"
"feature_directory": "specs/005-librarian-agent"
}
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,5 @@ Since this is primarily a research documentation repository without traditional
<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
[specs/004-phase2-project-bootstrap-testing/plan.md](specs/004-phase2-project-bootstrap-testing/plan.md).
[specs/005-librarian-agent/plan.md](specs/005-librarian-agent/plan.md).
<!-- SPECKIT END -->
94 changes: 94 additions & 0 deletions agents/prompts/librarian.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Librarian Agent

**Version**: 1.0.0
**Stage owned**: tool-style; invoked by other agents (`flesh_out`, `reference_validator`, future paper-side agents) — does NOT advance project state itself.
**Default backend**: dartmouth (fallback huggingface, then local)

## Purpose

Single canonical source of truth for **literature search + citation verification** in the llmXive pipeline. Replaces three pre-existing duplicate implementations (Constitution Principle I):

1. `agents/tools/lit_search.py` — used by flesh_out's lit_search call
2. `src/llmxive/agents/reference_validator.py` — primary-source comparison logic
3. `tests/phase1/citation_resolver.py` — Stage-1 mechanical resolver

The librarian:
1. Accepts a search term + optional context (project field, idea body excerpt, target citation count).
2. Issues real keyword searches against Semantic Scholar Graph API + arXiv API.
3. For each candidate citation, runs the canonical 3-check verification (URL resolves → title-token-overlap ≥0.7 → summary-grounded ≥0.5).
4. Per ≥10% of returned verified citations, downloads the full PDF and re-verifies summary-grounding for the sample (Q2: adaptive depth audit).
5. When fewer than `target_n` (default 5) verified citations are found, triggers a **multi-step expanded search** (this prompt's primary LLM use):
- LLM-brainstorms 10-20 alternative phrasings ranked by relevance
- Iterates over the expanded list, accumulating verified citations until ≥target_n found OR list exhausted (hard cap of 20 terms)
6. Returns structured JSON per `specs/005-librarian-agent/contracts/librarian-json-output.md`.
7. If a calling project's idea.md path is provided, appends or replaces a `## Search trail` subsection per `specs/005-librarian-agent/contracts/search-trail-md.md`.

The agent's **mechanical** parts (search, verify, PDF sample, cache) do not require LLM calls. The LLM is invoked **only** for the term-expansion step (this prompt's content).

## Inputs

- `term` (str): the original search term to be expanded.
- `context.field` (str, optional): the calling project's field (e.g., "computer science", "biology") — disambiguates terms with cross-domain meaning (e.g., "attention" in CS vs neuroscience).
- `context.idea_body_excerpt` (str, optional): first 1000 chars of the calling project's `idea/<slug>.md`, providing topical context for the expansion.
- `context.target_n` (int, default 5): the verified-citation count we're trying to reach.

## Output contract

A numbered list of 10-20 alternative phrasings, ranked by relevance, ONE PER LINE. Format:

```
1. <alternative phrase 1>
2. <alternative phrase 2>
3. ...
```

The downstream parser (`src/llmxive/librarian/expand.py:_parse_ranked_terms`) is tolerant: it accepts numbered lists (`1.`, `1)`, `1]`), bullet lists (`-`, `*`, `•`), and ignores section headers (`##`, `###`) + explanatory prose. But sticking to the canonical numbered-list format keeps the parse deterministic.

## Rules

- **DO NOT repeat the original term verbatim.** The caller has already tried it.
- **DO produce 10-20 terms.** Fewer than 10 risks exhausting the expansion before reaching target_n; more than 20 wastes budget (hard cap enforced).
- **Rank by relevance to the originating context.** Most-relevant terms first.
- **Include a mix of**:
- **Synonyms** (e.g., "code clones" → "duplicated source code")
- **Sub-area terms** (narrower scope; e.g., "transformer attention" → "scaled dot-product attention")
- **Domain-adjacent terms** (e.g., "code duplication LLM" → "AI-generated code redundancy")
- **More-general terms** (broader scope; e.g., "self-attention" → "neural attention mechanisms")
- **Avoid generic terms** that would surface unrelated papers (e.g., for a transformer-attention query, don't include "deep learning" or "machine learning" — too broad).
- **Use the project's field as a disambiguation lens.** "Attention" in CS context should NOT be expanded to "selective attention" (psychology); in psychology context, "attention" should NOT be expanded to "self-attention" (CS).
- **Output ONLY the numbered list.** No explanatory prose, no code blocks, no markdown headers. The downstream parser will tolerate stray content but it makes the output less reproducible.

## Example

For original term `"transformer attention"` in field `"computer science"`:

```
1. self-attention mechanisms
2. multi-head attention
3. scaled dot-product attention
4. transformer encoder layers
5. attention is all you need
6. softmax attention weights
7. positional encoding transformer
8. sequence-to-sequence attention
9. neural attention model
10. encoder-decoder attention
11. cross-attention
12. masked self-attention
```

For original term `"code duplication LLM perplexity"` in field `"computer science"`:

```
1. code clones language model perplexity
2. duplicated source code LLM evaluation
3. repeated code patterns model accuracy
4. AI code redundancy
5. token-level redundancy language models
6. ...
```

## Failure handling

- If the model cannot generate 10 distinct alternative terms (e.g., the original term is already maximally specific), it MAY return fewer (down to 5). The orchestrator handles "<10 terms returned" gracefully — the expanded search just iterates over whatever is provided.
- If the model returns generic terms (e.g., "machine learning" for any CS query), the verification step will reject most candidates and the result will likely be `outcome: "exhausted"`. This is acceptable; the caller decides next action per Q3.
17 changes: 17 additions & 0 deletions agents/registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,23 @@ agents:
default_model: qwen.qwen3.5-122b
wall_clock_budget_seconds: 300
paid_opt_in: false
- name: librarian
purpose: Canonical literature-search-and-verification agent (spec 005). Replaces
duplicate implementations in lit_search + reference_validator + citation_resolver.
Tool-style; invoked by other agents.
inputs:
- idea
outputs:
- idea
prompt_path: agents/prompts/librarian.md
prompt_version: 1.5.0
default_backend: dartmouth
fallback_backends:
- huggingface
- local
default_model: qwen.qwen3.5-122b
wall_clock_budget_seconds: 600
paid_opt_in: false
- name: specifier
purpose: Drive /speckit.specify for the project; draft spec.md from the idea.
inputs:
Expand Down
26 changes: 26 additions & 0 deletions agents/tools/citation_fetcher.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,31 @@
"""Citation-fetcher tool (T108).

⚠️ **Soft-deprecated post spec 005 (2026-05-06)** — this module's
title-overlap verification logic duplicates ``llmxive.librarian.verify.
verify_citation()``. New callers MUST use the librarian directly:

from llmxive.librarian.verify import verify_citation

This file remains in place because:
- The Reference-Validator Agent at
``src/llmxive/agents/reference_validator.py`` consumes this
module's ``FetchResult`` shape (with a ``VerificationStatus``
enum) which differs from the librarian's richer
``VerifiedCitation`` / ``VerificationFailure`` split.
- Adapting reference_validator + its tests to the librarian shape
is non-trivial; it was DEFERRED from spec 005 to a follow-up
issue (per spec.md FR-014/15) to keep spec 005's blast radius
contained. See ``notes/2026-05-06-spec-005-librarian-outline.md``
for context.
- The librarian's verification logic IS the canonical
implementation going forward; this module's ``fetch_citation()``
will be progressively migrated by the follow-up issue.

FR-022 (no new duplicates): adding a NEW caller of this module is
forbidden. Use the librarian. The CI test at
``tests/phase2/test_no_duplicate_lit_search.py`` (T070a) enforces
this.

Resolves a citation to its primary source and returns
`{fetched_title, fetched_authors, status}`. Distinguishes:
- `verified` — primary source reachable AND title-overlap ≥ threshold
Expand Down
Loading
Loading