We read the papers so you can ship the changes. Each article distills a recent arXiv paper on LLM retrieval, citation behavior, or AI-driven search into something you can act on.
Why “2x on ChatGPT” stories can be misleading without a tailwind control
arXiv:2606.04362When Cultural Knowledge Doesn't Transfer to Cultural Reasoning
arXiv:2606.01879How Prompt Language Rewrites Cultural Knowledge Before the Model Even Answers
arXiv:2605.30481Where LLM Fact-Checkers Go Wrong on Sources
arXiv:2605.30241When the Agent Learns From Its Own Corrections
arXiv:2606.02215Google AI Overviews drove a 12% rise in Reddit engagement, but AI Mode reversed those gains for experiential communities by substituting conversation for human discussion.
arXiv:2605.16428Coordinating a multi-page evidence ecosystem raises LLM search agent recommendation rates by up to 31 percentage points over the best single-page GEO baseline.
arXiv:2605.12887An audit of ChatGPT, Copilot, Gemini, and Perplexity finds ~16% of cited sources are AI-generated — with Copilot citing synthetic content in nearly 3 of every 10 citations.
arXiv:2605.23684Six frontier LLMs hallucinate 12–38% of scientific citations; a new agentic retrieval system hits zero hallucination at 30% better F1 and $0.05 per query.
arXiv:2605.14306Cosmetic prompt rewording drops AI brand-recommendation overlap by 21–32 percentage points — more divergence than switching providers entirely, across 12,000 runs.
arXiv:2605.27440Baidu's Aurora-Expiry uses RAG-augmented LLMs to infer query-specific expiration thresholds, cutting median document age 12.81% for time-sensitive queries in a 14-day live A/B test.
arXiv:2605.13052A 37,000-run audit of 533 brands finds RAG preserves the brand hierarchy: L4–L5 specialists face 48–52% invisibility while L1 leaders surface universally but convert at only 25–41%.
arXiv:2605.27439Schema.org markup gives retrieval agents 65.7% higher FAIR-compliant precision — but cuts query coverage by 29% where publishers haven't adopted it.
arXiv:2605.28787A Microsoft study shows a single false top search result drops GPT-5 accuracy from 65% to 18% — while humans solve the same queries at 93% — exposing a critical gap in agentic RAG deployments.
arXiv:2603.00801A new paper from Virginia Tech maps four failure modes that prevent pages from being cited in AI-generated responses. 43% of relevant pages receive zero citations under baseline conditions.
arXiv:2603.09296No LLM verifies even half its citations under any tested condition — and adding temporal cutoffs or other deployment constraints collapses verifiability to near zero.
arXiv:2603.07287A new statistical framework shows that single-run citation share metrics from Perplexity, SearchGPT, and Gemini carry confidence intervals wide enough to make most apparent SEO gains statistically indistinguishable from noise.
arXiv:2603.08924Rewriting pages as entity documents lifted AI answer accuracy ~30%; adding JSON-LD did almost nothing — it gets cut before indexing.
arXiv:2603.10700