Lit Ingest
Ingest ONE raw source (a sources/ PDF or .txt path, or pasted text) into the Literature Vault — read it, write a vault/summaries/<bibkey>.md annotated page with claims+locators and verbatim quotes,.
Core Rule
Ingest is extraction, not authorship. Everything in the summary and the proposed
.bib entry comes FROM the source document in front of you — never from your prior, never
from what a paper with this title "probably" says. A summary that asserts a finding the
source does not state, or a .bib entry with a guessed DOI, is a fabrication — the one
thing this kit exists to prevent. Metadata you cannot read with confidence is left as
[VALUE — verify], never filled in. Quotes are verbatim and carry a locator. You read
one source and turn it into one vault/summaries/<bibkey>.md page plus a proposed
references.bib entry; you never invent a second source to go with it.
When to Use
Invoke with /lit-ingest <source> when:
- You have added a paper to
sources/(PDF/txt/md) and want it in the vault. - You want to paste extracted text or notes for a source and have it summarized + linked.
/literature-reviewreported a gap, you fetched a real paper to fill it, and now need it ingested so the next synthesis can use it.- A
references.bibentry exists asstatus: metadata-onlyand you now have the full text to upgrade its page.
<source> is a path under sources/ (e.g. /lit-ingest sources/doe2023-tooluse.pdf) or
the word paste followed by the text. One source per invocation — batch ingests go through
the vault-maintainer agent so the main thread stays clean.
Process
Phase 1 — Read the source (never modify it)
- Read
VAULT.md(the schema) andMANUSCRIPT_MAP.md → Thesisso the summary's "Relevance to thesis" section is grounded in the actual argument. - Read the raw source from
sources/(use thepdfskill for PDFs) or take the pasted text. Treat it as immutable —protect-sources.shblocks edits tosources/; never try to "clean up" or rewrite the raw file. - If the text is unreadable (scanned image, OCR garbage, truncated paste), say so and stop — do not summarize a document you cannot actually read. An honest "cannot read this PDF" beats a confident summary of nothing.
Phase 2 — Extract metadata & choose the cite key
From the document's own front matter (title block, author list, venue line, DOI on page 1):
- Pull title, authors, year, venue. For any field you cannot read with confidence, write
[VALUE — verify]— never guess a year or a venue from the title. - Choose a cite key as
firstauthorYEARkeyword(e.g.doe2023tooluse). If the year is unknown, the key still needs to be stable — pick the keyword and flag the year. Keep it consistent with any existing scheme inreferences.bib. - The page filename is exactly the cite key:
vault/summaries/<bibkey>.md. One summary per.bibkey (Rule 4 inVAULT.md).
Phase 3 — Write the summary page
Create vault/summaries/<bibkey>.md from vault/_templates/source.md, filling every
section per the VAULT.md source-page schema:
- Summary — 2–4 sentences in your words, grounded in the text.
- Key claims (with locators) — each claim tagged with a page/§ locator and a strength
verb (
shows/reports/suggests). No locator → not a logged claim. - Method — enough to judge the evidence's weight (design, sample, what was measured).
- Findings — the actual results with numbers and locators. Copy numbers exactly; if
a number is unreadable, write
[VALUE — verify], never approximate one. - Limitations / scope — what the source does NOT establish. This is the guardrail that stops a later draft from overclaiming it.
- Relevance to thesis — supports / contradicts / frames the current argument.
- Quotes (verbatim + locator) — exact wording with
p. <n>. A quote you cannot place to a page does not go in. - Open questions / follow-ups — citation-chaining leads (papers it cites worth fetching), things to verify.
Set status: honestly: read (full text digested), skimmed (read partially — flag what
you did not cover), or metadata-only (no full text — you may record its existence but must
NOT assert findings from it).
Phase 4 — Propose the references.bib entry (extracted, not invented)
Propose a BibTeX entry built from the metadata you extracted in Phase 2:
@inproceedings{doe2023tooluse,
title = {Tool-augmented language models for single-turn QA},
author = {Doe, Jane and Roe, Richard},
booktitle = {Proceedings of ExampleCL},
year = {2023},
doi = {[VALUE — verify]}
}- Never fabricate a DOI. If page 1 carries no DOI, leave it
[VALUE — verify]or omit the field.block-fabrication.shrejects placeholder/fake-shaped DOIs (10.xxxx,example.com,TODO) — so a half-known reference surfaces for the author instead of slipping in as real. That is the system working, not a failure. - Show the entry for the author to paste into
references.bib; do not silently assume it is already there. (Writing the.bibis the author's call;citation-gate.shlater confirms the\citeresolves.) - If the cite key already exists in
references.bib, reconcile rather than duplicate — flag any mismatch between the existing entry and what the document says.
Phase 5 — Cross-reference concepts & entities
Wire the new page into the vault with [[wikilinks]]:
- For each cross-source idea the source bears on, link/create
vault/concepts/<concept>.md(fromvault/_templates/concept.md) — e.g.[[tool-call-hallucination]],[[verification-gating]]. Add this source to the concept's "Sources that bear on it" with its stance, and note any tension with sources already listed. - For each benchmark / dataset / method / model / group, link/create
vault/entities/<entity>.md(fromvault/_templates/entity.md) — e.g.[[NeurIPS-tooluse-benchmark]]. Record how this source uses it and any caveat (contamination, version drift). - A
[[link]]to a page that does not exist yet is fine — it marks something worth writing. Do not invent the linked page's contents; create a stub and move on.
Phase 6 — Update index + log (mandatory)
- Add a line to
vault/index.md → ## Sources (by cite key), newest first:- [[doe2023tooluse]] — Tool-augmented LMs for single-turn QA (Doe & Roe, 2023) — read. List any new concept/entity pages under their headings. - Append one line to
vault/log.md(append-only, newest at the bottom):2026-06-03 ingest doe2023tooluse — created summary, proposed .bib entry, linked [[verification-gating]].
Never rewrite the log; never skip the index. Per VAULT.md Rule 3, no ingest is complete
without both.
Output Format
- The summary page — written to
vault/summaries/<bibkey>.md, conforming to the schema. - Proposed
.bibentry — shown in a fenced block for the author to paste; uncertain fields as[VALUE — verify], no invented DOI. - Cross-reference report — which
concepts/andentities/pages were created or touched, and the new[[wikilinks]]. - Index + log diff — the lines added.
- Flags — every
[VALUE — verify]left open, thestatus:set, and any source passage that was unreadable. End with the honest tally:(claims logged / quotes captured / fields left to verify).
Pairs With
/literature-review— the consumer: it synthesizes from the summaries this skill writes. Ingest a fetched source here, then re-run the review to close the gap honestly.fact-checkeragent — readssources/(not the summary) for exact wording; the summary's locators tell it where to look.vault-maintaineragent — dispatch it for batch ingest / heavy cross-referencing so the main thread stays clean.block-fabrication.sh(PreToolUse) — rejects the proposed.bibentry if it carries a placeholder/fake DOI. Flag it[VALUE — verify], do not invent one.citation-gate.sh(PostToolUse) — later confirms the\cite{<bibkey>}resolves once the entry is inreferences.bib./lit-lint— run after a batch of ingests to catch orphans and missing locators.
Notes
- Never modify
sources/. Derive intovault/; the raw file is evidence and stays byte-for-byte intact (protect-sources.shenforces this). - The summary is a convenience layer, never an override: when
/literature-revieworfact-checkerneeds exact wording, they return tosources/, not the summary (VAULT.mdRule 5). metadata-onlyis a real, valid status — a citation without the content. Record the source's existence; do NOT assert its findings until the full text is ingested.- Synthesis and judging relevance are Reasoner-tier work (see
CLAUDE.md → Model Selection); mechanical metadata extraction is fine on the Drafter. - If ingesting reveals the source contradicts a claim already in the manuscript, that is a finding — surface it; do not quietly smooth it over.
Literature Vault (VAULT.md)
Literature Vault schema — annotated bibliography, source/concept/entity pages, ingest workflow.
Lit Lint
Health-check the Literature Vault — find cross-source contradictions, orphan pages (a summary with no matching references.bib key, or a .bib key with no summary), broken [[wikilinks]], claims missing.