← All posts
May 27, 2026

How I made a React site visible to Google, link previews, and AI

A vanilla React + Vite site ships a blank page to anything that isn't a browser. Here is the small, boring fix — prerendered HTML, a real sitemap, sensible robots, and a friendly note to the LLMs.

seogeoreactviteblog

This post was written by my Replit AI agent, who did the building. I asked the questions, made the calls, and decided what was worth shipping. We're writing this series together — and that's part of the story too.

This site is built with React and Vite. That is a perfectly good way to ship a marketing site in 2026 — until you ask a search engine, a link unfurler, or an AI assistant what your home page is about. And they all answer: nothing.

This is the story of why that happens and the small, boring fix I shipped here. If you are building a similar React + Vite site and want it to show up on Google, look right in an iMessage preview, and get cited by ChatGPT, this is the checklist.

Why this matters if you're not a developer

You don't need to understand React or Vite to care about this. Here's the short version: if you used a modern website builder or coding tool to make your site, there's a good chance that Google, iMessage link previews, and AI assistants like ChatGPT and Claude literally cannot read it. Not because your content is bad — because the technology serves a blank page to anything that isn't a full web browser.

That means your beautiful site might as well not exist when someone searches for you, shares your link in a group chat, or asks an AI "what does this company do." For a portfolio, a small business, or a personal brand, that's not a minor inconvenience — that's the whole point of having a site in the first place.

The fix is not complicated and you don't have to start over. That's what this post walks through.

The problem in one paragraph

A vanilla React + Vite app ships an almost-empty HTML file. Something like this:

<div id="root"></div>
<script type="module" src="/assets/index-abc123.js"></script>

That is the entire page. The actual content — your hero copy, your project descriptions, your blog post — is built in the browser after the JavaScript downloads and runs. A real browser handles that fine. A lot of other things don't.

Who that actually hurts

  • Search engines on a bad day. Googlebot can run JavaScript, but it does so reluctantly, on a delay, with a smaller budget than most people realize. Bing is more hit-or-miss. Smaller engines often don't try at all.
  • Link previews. When you paste your URL into iMessage, Slack, LinkedIn, Discord, or a Mastodon post, the unfurler fetches your page once and reads the HTML. It does not run JavaScript. If the title and description live in your JS bundle, the preview will be blank or generic.
  • AI assistants. ChatGPT, Claude, Perplexity, and Google's AI Overviews are increasingly the way people find things. Their crawlers behave more like an unfurler than a browser — they want real HTML. If your site is an empty shell, you do not exist to them.

For a portfolio or marketing site, those are the audiences. Being invisible to them is the whole bug.

The fix, in one sentence

Prerender every public route to real HTML at build time, and keep the React app on top of it for actual humans.

What "prerender" actually means

Before you deploy, you walk the SPA in a headless browser, snapshot the final HTML once it has finished rendering, and write a real .html file for every URL. The static host then serves that file. When a real visitor arrives, React still boots on top of it and the site feels like a normal SPA. When a crawler arrives, it gets a complete page with no JavaScript required.

That is the entire idea. You do not need to rewrite to Next.js or Remix. You do not need a server. You just need one extra step in your build.

Why a markdown blog (and not a CMS)

A blog with a database, an admin UI, and a hosted CMS is a lot of moving parts for a one-author site that posts once a month. So for this blog I went the other direction:

  1. Every post is a single markdown file in content/blog/.
  2. Frontmatter at the top of each file holds the title, date, excerpt, cover image, and tags.
  3. Setting draft: true keeps a post out of production.
  4. To publish, I commit the file. To unpublish, I flip the flag.

No CMS. No login. No database. The blog code is about 250 lines of TypeScript.

URLs stay clean — /blog/my-post, no .html extension — because the prerender writes each post to dist/blog/my-post/index.html and the static host serves it on the bare URL. This matters a little for aesthetics and a lot for canonical tags: one canonical URL per post, no ambiguity.

The actual SEO checklist I shipped

None of this is exotic. It is the boring 1998 stuff that still works.

  • A unique <title> and <meta name="description"> on every page. Generated from the post frontmatter for blog posts, hand-written for the static pages.
  • Open Graph and Twitter Card tags so iMessage, Slack, and LinkedIn unfurlers have a title, description, and image to show. Blog posts get their cover image; static pages fall back to a site-level one.
  • A <link rel="canonical"> tag on every page pointing to the one true URL. With prerendered HTML this is also the same URL the React app uses, so there is no duplicate-content split.
  • A real sitemap.xml generated at build time from the same list of routes the prerender uses. Each blog post entry uses its publish date as lastmod. Drafts are not included.
  • A robots.txt that allows everything and points at the sitemap (more on this below).
  • JSON-LD structured data. The home page carries an Organization block. Each blog post carries a BlogPosting block with headline, publish date, author, and cover image. JSON-LD is just a small JSON snippet inside a <script type="application/ld+json"> tag — it gives crawlers a structured summary of the page they are already looking at.

Everything above ends up in the prerendered HTML, so anything that reads the page sees it.

The GEO layer

GEO stands for generative engine optimization — the umbrella term for "what makes your stuff show up in ChatGPT answers." There is no clean checklist for it yet, but the order of importance is pretty clear:

  1. Real HTML. If an AI crawler fetches your URL and gets an empty <div id="root">, nothing else matters. Prerendering is the table stakes.
  2. Semantic structure. Use <h1>/<h2>/<h3> for real hierarchy, <article> for posts, plain prose with short paragraphs. AI crawlers are extractive — they reward content that is easy to chunk.
  3. JSON-LD. Same BlogPosting and Organization blocks the search engines like. AI summarizers read them too.
  4. An llms.txt file at the root. This is a recent convention (think robots.txt, but for LLMs) — a plain-text summary of what the site is, who runs it, and links to the key pages. Mine lists the home page, the blog index, and every published post.
  5. Allow AI crawlers in robots.txt. If you block them you cannot be cited. More on that next.

robots.txt — what it is good for, what it is not

Two things to know.

It is good for telling crawlers where your sitemap lives, opting bots in or out, and managing crawl budget on a large site. A reasonable starting robots.txt for a small site that wants to be findable by both search and AI:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

Sitemap: https://example.com/sitemap.xml

The wildcard at the top already allows the AI bots — the named entries are mostly there to make the intent obvious to a human reading the file.

It is not good for hiding pages. A Disallow in robots.txt tells well-behaved bots not to crawl a URL, but the URL itself can still appear in search results if other sites link to it. To actually keep a page out of results, use <meta name="robots" content="noindex"> on the page. I add this to my 404 page and to anything draft-like that slips through.

And the rule of thumb that trips most people up: never block /assets/, your CSS, or your JS. Google needs to fetch your stylesheet to decide whether the page is mobile-friendly. Blocking it tanks your ranking. There is no shortcut here.

The duplicate-content question

A reasonable worry: "I now have a prerendered HTML version and a React-rendered version of the same page. Won't Google double-count and penalize me?"

No. Two reasons:

  • There is only ever one URL per page. The canonical tag on the page points at that single URL.
  • The prerendered HTML and the React hydration are showing the same content, the same way responsive sites show the same content at different widths. Google treats it as one page.

I packaged this up so you don't have to

Because I am building this site on Replit, the whole recipe above — prerender, useSeo hook, sitemap, robots.txt, llms.txt, JSON-LD, GTM install — is now bottled as two reusable "skills" the Replit Agent can apply to any project:

  • react-vite-seo-prerender — drop this onto any React + Vite site and the agent will wire up the same prerender + metadata pipeline, including the Chromium-on-NixOS workaround that took me an hour to find.
  • react-vite-markdown-blog — same thing for the file-based markdown blog: one folder per post, frontmatter, draft flag, no CMS.

Both are plain markdown — right-click and "Save link as" to download. To use one in your own Replit project, save the file to .agents/skills/<skill-name>/SKILL.md and the agent will pick it up the next time you ask it for that kind of work. Or just open the file and follow the steps by hand — it reads like a recipe.

One thing to take away

A React site is not invisible to crawlers because crawlers are broken. It is invisible because you handed them the wrong file. Hand them a real one — at build time, automatically, every time — and the rest is the same boring checklist it has always been.