Research — BLURSOR.ai

01

Jul 21, 2026 10 min read

DRNoise shows how one plausible false document can knock deep research agents off course

One misleading “direct claim” can derail an agent that’s otherwise correct

arXiv:2607.17291

02

Jul 20, 2026 9 min read

Chinese generative search cites only a small slice of available brand sources, and external quality scores do not predict what surfaces

Why citation coverage is sparse—and what actually predicts which sources get surfaced

arXiv:2607.15771

03

Jul 16, 2026 8 min read

GEO can change citations inside a fixed context, but it doesn’t show durable organic visibility

Why GEO gains don’t translate into durable visibility (and which levers actually hold).

arXiv:2607.14035

04

Jul 16, 2026 9 min read

LLM brand answers are mostly unstable because language changes the signal

Variance components explain why brand answers won’t stabilize—and what to sample instead

arXiv:2607.13304

05

Jun 25, 2026 9 min read

LLMs mostly source “brand reputation” from other people’s pages

What LLMs treat as “brand reputation” is mostly other people’s web pages, not the brands themselves

arXiv:2606.25787

06

Jun 23, 2026 10 min read

AI brand “ownership” is moderately concentrated, but the winner changes by model

Who gets the “top pick” in AI recommendations—and how consistent is it across models?

arXiv:2606.23057

07

Jun 23, 2026 10 min read

English-only AI reputation monitoring misses local champions in multilingual markets

English-language prompts create a measurable “local-visibility” blind spot across languages

arXiv:2606.23165

08

Jun 23, 2026 9 min read

AI visibility breaks down by entity, not just by mention count

Why “mention” counts fail: fabricated citations scale differently by entity and query context

arXiv:2606.21595

09

Jun 19, 2026 9 min read

AI search visibility starts with brand stature, not prompt tweaks

Why the first-run visibility gap between big brands and everyone else is so persistent

arXiv:2606.20065

10

Jun 12, 2026 9 min read

One Polluted Page Is Enough to Hijack LLM Recommendations

Why one poisoned search result is enough to hijack LLM recommendations

arXiv:2606.13610

11

Jun 9, 2026 9 min read

Safety-trained RAG models can turn a prompt injection into brand suppression

Why safety alignment can turn retrieval-time injections into brand-level anti-promoters

arXiv:2606.09204

12

Jun 4, 2026 10 min read

ChatGPT referral spikes can overstate AEO unless you control for platform growth

Why “2x on ChatGPT” stories can be misleading without a tailwind control

arXiv:2606.04362

13

Jun 2, 2026 7 min read

LLMs Score 94% on Cultural Knowledge Tests and 40% When the Answer Choices Are Removed

When Cultural Knowledge Doesn't Transfer to Cultural Reasoning

arXiv:2606.01879

14

Jun 2, 2026 7 min read

English Prompts Suppress Bengali Cultural Knowledge Even When Local Evidence Is Provided

How Prompt Language Rewrites Cultural Knowledge Before the Model Even Answers

arXiv:2605.30481

15

Jun 2, 2026 7 min read

LLM Fact-Checkers Score Well But Retrieve the Wrong Sources

Where LLM Fact-Checkers Go Wrong on Sources

arXiv:2605.30241

16

Jun 2, 2026 7 min read

An LLM Agent That Learns From Its Own Mistakes Beats Human Fact-Checkers on Health Misinformation 89% of the Time

When the Agent Learns From Its Own Corrections

arXiv:2606.02215

17

Jun 1, 2026 7 min read

AI Overviews Sent Users to Reddit. AI Mode Is Taking Them Back.

Google AI Overviews drove a 12% rise in Reddit engagement, but AI Mode reversed those gains for experiential communities by substituting conversation for human discussion.

arXiv:2605.16428

18

Jun 1, 2026 7 min read

Ecosystem GEO Beats Page-Level Optimization by Up to 31 Points for Agent Search

Coordinating a multi-page evidence ecosystem raises LLM search agent recommendation rates by up to 31 percentage points over the best single-page GEO baseline.

arXiv:2605.12887

19

Jun 1, 2026 7 min read

When AI Cites AI: The Synthetic Source Problem in Generative Search

An audit of ChatGPT, Copilot, Gemini, and Perplexity finds ~16% of cited sources are AI-generated — with Copilot citing synthetic content in nearly 3 of every 10 citations.

arXiv:2605.23684

20

Jun 1, 2026 6 min read

Frontier LLMs Hallucinate Up to 38% of Scientific Citations

Six frontier LLMs hallucinate 12–38% of scientific citations; a new agentic retrieval system hits zero hallucination at 30% better F1 and $0.05 per query.

arXiv:2605.14306

21

Jun 1, 2026 6 min read

Rewording a Buying Question Changes the Brands AI Recommends More Than Switching Models Does

Cosmetic prompt rewording drops AI brand-recommendation overlap by 21–32 percentage points — more divergence than switching providers entirely, across 12,000 runs.

arXiv:2605.27440

22

Jun 1, 2026 6 min read

Query-Specific Expiry: Why 'Recent' Isn't the Same as 'Fresh'

Baidu's Aurora-Expiry uses RAG-augmented LLMs to infer query-specific expiration thresholds, cutting median document age 12.81% for time-sensitive queries in a 14-day live A/B test.

arXiv:2605.13052

23

Jun 1, 2026 8 min read

RAG Doesn't Flatten the Brand Hierarchy — It Just Moves Where You Lose

A 37,000-run audit of 533 brands finds RAG preserves the brand hierarchy: L4–L5 specialists face 48–52% invisibility while L1 leaders surface universally but convert at only 25–41%.

arXiv:2605.27439

24

May 31, 2026 7 min read

Semantic Metadata Makes Agents More Reliable, Not Smarter

Schema.org markup gives retrieval agents 65.7% higher FAIR-compliant precision — but cuts query coverage by 29% where publishers haven't adopted it.

arXiv:2605.28787

25

May 26, 2026 6 min read

One Bad Search Result Breaks Frontier AI Agents — Completely

A Microsoft study shows a single false top search result drops GPT-5 accuracy from 65% to 18% — while humans solve the same queries at 93% — exposing a critical gap in agentic RAG deployments.

arXiv:2603.00801

26

May 22, 2026 8 min read

AI Research Agents Fail in Ways Their Own Tests Can't See

A new SoK survey of 118 works shows that agentic RAG's iterative retrieval and memory systems introduce failure modes that static metrics and current benchmarks cannot detect.

arXiv:2603.07379

27

May 22, 2026 7 min read

Most Pages Get Zero AI Citations. Editing 5% Won 40% More.

A new paper from Virginia Tech maps four failure modes that prevent pages from being cited in AI-generated responses. 43% of relevant pages receive zero citations under baseline conditions.

arXiv:2603.09296

28

May 22, 2026 7 min read

AI Cites Sources It Never Checked — and Half Don't Hold Up

No LLM verifies even half its citations under any tested condition — and adding temporal cutoffs or other deployment constraints collapses verifiability to near zero.

arXiv:2603.07287

29

May 22, 2026 8 min read

Generative Search Citation Share Is a Noisy Estimator, Not a Score

A new statistical framework shows that single-run citation share metrics from Perplexity, SearchGPT, and Gemini carry confidence intervals wide enough to make most apparent SEO gains statistically indistinguishable from noise.

arXiv:2603.08924

30

May 22, 2026 7 min read

Schema Markup Barely Helps AI Find You. Restructuring Does.

Rewriting pages as entity documents lifted AI answer accuracy ~30%; adding JSON-LD did almost nothing — it gets cut before indexing.

arXiv:2603.10700

How AI decides what to cite, rank, and surface

DRNoise shows how one plausible false document can knock deep research agents off course

Chinese generative search cites only a small slice of available brand sources, and external quality scores do not predict what surfaces

GEO can change citations inside a fixed context, but it doesn’t show durable organic visibility

LLM brand answers are mostly unstable because language changes the signal

LLMs mostly source “brand reputation” from other people’s pages

AI brand “ownership” is moderately concentrated, but the winner changes by model

English-only AI reputation monitoring misses local champions in multilingual markets

AI visibility breaks down by entity, not just by mention count

AI search visibility starts with brand stature, not prompt tweaks

One Polluted Page Is Enough to Hijack LLM Recommendations

Safety-trained RAG models can turn a prompt injection into brand suppression

ChatGPT referral spikes can overstate AEO unless you control for platform growth

LLMs Score 94% on Cultural Knowledge Tests and 40% When the Answer Choices Are Removed

English Prompts Suppress Bengali Cultural Knowledge Even When Local Evidence Is Provided

LLM Fact-Checkers Score Well But Retrieve the Wrong Sources

An LLM Agent That Learns From Its Own Mistakes Beats Human Fact-Checkers on Health Misinformation 89% of the Time

AI Overviews Sent Users to Reddit. AI Mode Is Taking Them Back.

Ecosystem GEO Beats Page-Level Optimization by Up to 31 Points for Agent Search

When AI Cites AI: The Synthetic Source Problem in Generative Search

Frontier LLMs Hallucinate Up to 38% of Scientific Citations

Rewording a Buying Question Changes the Brands AI Recommends More Than Switching Models Does

Query-Specific Expiry: Why 'Recent' Isn't the Same as 'Fresh'

RAG Doesn't Flatten the Brand Hierarchy — It Just Moves Where You Lose

Semantic Metadata Makes Agents More Reliable, Not Smarter

One Bad Search Result Breaks Frontier AI Agents — Completely

AI Research Agents Fail in Ways Their Own Tests Can't See

Most Pages Get Zero AI Citations. Editing 5% Won 40% More.

AI Cites Sources It Never Checked — and Half Don't Hold Up

Generative Search Citation Share Is a Noisy Estimator, Not a Score

Schema Markup Barely Helps AI Find You. Restructuring Does.