Schema Markup Barely Helps AI Find You. Restructuring Does.

A paper from WordLift, published on arXiv in March, tests a claim that circulates widely in SEO and content strategy: that adding JSON-LD structured markup to web pages makes them more retrievable by AI systems. The researchers ran a controlled factorial experiment — 349 queries, 158 entities, 7 retrieval conditions — crossing three document formats against two retrieval modes. The result is not what the conventional advice predicts.

JSON-LD markup appended to existing HTML produces a statistically significant but practically negligible accuracy improvement. The large gain — roughly 30% — comes from a different intervention entirely: redesigning the document as a knowledge-graph-derived entity page, with entity relationships materialized as readable text rather than encoded in a markup block. These are not the same thing, and the paper's main contribution is demonstrating the difference empirically.

The number that frames the study: 88% of JSON-LD documents in the experiment exceed the character truncation limit of Vertex AI Vector Search before indexing. The JSON-LD block starts at a median position of character 18,510 — right at the boundary where content gets cut.

The JSON-LD Assumption Doesn't Hold

The standard recommendation for making web content more legible to AI systems is to append JSON-LD structured data to the page. The assumption is that retrieval pipelines can parse this markup and use it to improve answer quality. The experiment tests this directly.

d = 0.18

Effect size for JSON-LD added to HTML (accuracy only; completeness not significant)

88%

JSON-LD documents exceeding the vector search truncation limit before indexing

Adding JSON-LD to plain HTML raises mean accuracy from 3.62 to 3.89 — a Cohen's d of 0.18, classified as a small effect. The completeness improvement disappears entirely after Bonferroni correction for multiple comparisons. Across 349 queries, the practical difference is negligible.

The infrastructure explanation is more damning than the effect size. 82% of plain HTML documents already exceed the vector search truncation limit. For JSON-LD documents, that figure rises to 88%. The JSON-LD block is appended at the end of the HTML — median starting position character 18,510 — which places it precisely at or beyond the point where the embedding model stops reading. The markup is present in the source. It is not present in the index.

This isn't a finding about JSON-LD semantics. It's a finding about where JSON-LD sits in the document relative to where the retrieval pipeline stops. Pipelines with structured-data-aware ingestion, or different truncation limits, may behave differently. But for the specific infrastructure tested here — Vertex AI Vector Search 2.0 with gemini-embedding-001 — the most common implementation of structured data optimization is being silently neutralized before retrieval begins.

The Real Gain Comes From Restructuring the Document, Not Annotating It

The enhanced entity page format does something structurally different from appending markup. It materializes related-entity data inline as readable text, adds navigational affordances with dereferenceable URIs, and embeds agent instructions. Entity relationships that plain HTML encodes as opaque URIs become explicit prose. The document is redesigned around the entity, not annotated after the fact.

+29.6%

Accuracy gain from enhanced entity pages over plain HTML in standard RAG — matching the full agentic pipeline's +29.8% on the same format

In standard RAG, this format achieves a mean accuracy of 4.69 against the plain HTML baseline of 3.62 — a +29.6% improvement, with a Cohen's d of 0.60. The full agentic pipeline on the same format reaches 4.70, a +29.8% gain over baseline (d=0.61). Both are large effects by conventional standards.

The gain is sharpest for factual queries: accuracy rises from 2.74 to 4.57, a +66.8% improvement. This makes sense structurally — factual queries require precise attribute retrieval, and the enhanced format surfaces those attributes as extractable text rather than requiring the model to resolve a URI or infer a relationship. Synthesis-heavy queries benefit less because the bottleneck there isn't attribute visibility.

Domain matters considerably. SalzburgerLand (travel) moves from 2.19 to near-ceiling, with a Cohen's d of 2.47 for the enhanced agentic condition. The WordLift Blog (editorial) domain shows a d of 2.73. BlackBriar (e-commerce) starts at 4.92 out of 5 and ends at 4.91 — essentially no movement. Product pages in e-commerce already render key entity properties as visible HTML. The format change adds nothing when the baseline already surfaces the relevant attributes.

Agentic RAG Helps — Until the Document Format Is Fixed

On HTML+JSON-LD documents, switching from standard to agentic RAG yields a meaningful improvement: +13.1% accuracy and +20.1% completeness. The agent compensates for what the document doesn't surface directly — following links, querying the knowledge graph, making additional tool calls to assemble an answer the static document doesn't provide cleanly.

+13.1%

Accuracy gain from agentic RAG over standard RAG on HTML+JSON-LD documents

+0.2%

Accuracy gain from agentic RAG over standard RAG when document format is already enhanced

When the document format is already optimized, the agent adds almost nothing. Enhanced standard RAG scores 4.69; enhanced agentic RAG scores 4.70. The difference is not statistically significant. The agent's value, in this experiment, is inversely proportional to how well the document is structured.

The agentic metrics reinforce this. Enhanced+ pages expose 102.2 discoverable links per query on average, compared to 41.7 for plain HTML. Yet agents follow fewer links on enhanced pages — 0.4 per query versus 1.0 on plain HTML. The agent is doing less work because the document already contains what it needs. When the document is poor, the agent compensates. When the document is good, the agent is largely redundant for accuracy.

This reframes the ROI calculation for agentic pipeline investment. If the primary value of agentic retrieval is compensating for poor document structure, then improving document structure and simplifying the pipeline may achieve the same outcome at lower cost and complexity.

What the Study Can't Prove

The enhanced entity page differs from plain HTML in two ways simultaneously: it restructures the layout and it materializes data from related entities that the baseline only references via opaque URIs. No ablation separates these factors. The 29.6% gain could come from layout restructuring, from entity data materialization, or from both in combination. The paper doesn't resolve this.

Critical Evaluation circularity: ground truth derived from same KG as enhanced pages

Critical No ablation isolating layout restructuring vs. entity data materialization

Moderate Conflict of interest: WordLift authored study using WordLift KG infrastructure

Moderate Results tied to specific vector search truncation behavior; may not generalize

The evaluation has a circularity problem. Ground-truth answers are derived from the same knowledge graph used to construct the enhanced pages. Conditions that present knowledge-graph data more directly — C3, C6, C6+ — may score higher partly because the evaluation rewards textual proximity to knowledge-graph-derived facts, not because the retrieval is genuinely better. The researchers don't flag this explicitly.

WordLift, the primary institution, is also the provider of the knowledge graph infrastructure under study. This is a conflict of interest the paper does not acknowledge. The results may be accurate, but independent replication on third-party knowledge graphs and different retrieval infrastructure would substantially strengthen the claims.

The dataset is also unevenly distributed in ways that affect aggregate interpretation. BlackBriar accounts for 39% of queries and contributes virtually no improvement due to ceiling-level baseline performance. WordLift Blog represents only 6% of queries — 22 total — and produces the largest effect sizes. Aggregate accuracy numbers blend these very different situations.

Before the Next Structured Data Implementation

The practical implication is not that structured data is useless. It's that the implementation layer matters more than the presence of markup.

JSON-LD appended to the bottom of a long HTML page is likely being truncated before indexing in most standard vector search deployments. Verifying where your JSON-LD sits relative to your pipeline's effective character limit is a more useful diagnostic than checking whether the markup validates. If the markup never reaches the index, its semantic quality is irrelevant.

The larger finding is about document architecture. The 30% accuracy gain in this experiment comes from making entity relationships legible as text — surfacing attributes inline, linking to related entities with dereferenceable URIs, structuring the document around what a retrieval model needs to extract rather than what a human browser renders. That's a content design problem, not a markup problem.

For domains where baseline HTML already surfaces key attributes as visible text — e-commerce product pages being the clearest example — the format change adds nothing. The intervention is most valuable where entity relationships are currently implicit, encoded in URIs, or buried in prose that doesn't support clean extraction.

The agentic pipeline finding is a useful corrective to current investment patterns. If agentic retrieval is primarily compensating for poor document structure, fixing the documents is the more durable solution.

Key Takeaway

Across 349 queries and 7 retrieval conditions, the ~30% accuracy gain attributed to 'structured data' comes entirely from redesigning the document as a knowledge-graph entity page — not from adding JSON-LD markup, which is silently truncated before indexing in the majority of cases.

If you're relying on appended JSON-LD to make your content more retrievable by AI systems, you may be optimizing a layer that never reaches the index; the actual lever is restructuring page content to surface entity relationships as readable text.

Source

Volpini, Andrea, Raad, Elie, Gamba, Beatrice, Riccitelli, David (2026). Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval. arXiv:2603.10700

Schema Markup Barely Helps AI Find You. Restructuring Does.

The JSON-LD Assumption Doesn't Hold

The Real Gain Comes From Restructuring the Document, Not Annotating It

Agentic RAG Helps — Until the Document Format Is Fixed

What the Study Can't Prove

Before the Next Structured Data Implementation

RAG systems inherit ranking signals from their retriever — and amplify them

Brand mention frequency in training data predicts LLM recommendation behavior