--- name: react-vite-seo-prerender description: Make a React + Vite SPA visible to Google, link unfurlers, and AI crawlers (ChatGPT, Claude, Perplexity) by prerendering every public route to real HTML at build time, plus generating sitemap.xml, robots.txt, llms.txt, per-page OG/Twitter/canonical/JSON-LD tags, and optionally installing Google Tag Manager. Use when the user asks to fix SEO on a React/Vite site, get link previews working, get cited by AI assistants, add a sitemap, add llms.txt, or generally make a SPA crawlable without rewriting to Next.js/Remix/Astro. --- # React + Vite SEO & AI-Visibility Pipeline > **Disclaimer:** This skill was built and tested on Replit. The core (prerendering, sitemap, robots.txt, llms.txt, OG/JSON-LD tags) is portable, but the deployment glue — per-route rewrites in `artifact.toml`, the build-time rewrite guard, and the Puppeteer Chromium path notes — is Replit-specific. If you're shipping elsewhere (Vercel, Netlify, Cloudflare Pages, your own server), consult your host's docs for the equivalent routing config. No guarantees outside the original environment. This skill turns a standard React + Vite SPA into a site that crawlers, link unfurlers, and AI assistants can actually read — *without* rewriting to a framework. The technique is build-time prerendering: snapshot every public route in headless Chromium, write a real `.html` file per URL, and ship that alongside the React bundle. Humans still get the SPA; bots get real HTML. ## When to use - The user complains that their React/Vite site is "invisible to Google" or "shows a blank page in iMessage previews." - The user wants AI assistants (ChatGPT, Claude, Perplexity, Google AI Overviews) to cite their site. - The user asks for a sitemap, robots.txt, llms.txt, JSON-LD, Open Graph tags, or "SEO" on a SPA. - The user wants to install Google Tag Manager / analytics on a SPA. **Do not use** if the project is already on Next.js, Remix, Astro, or any other SSR/SSG framework — those handle this natively. ## What you build By the end you will have: 1. A `useSeo()` React hook that sets per-page ``, description, canonical URL, Open Graph, Twitter Card, robots index/noindex, and JSON-LD. 2. A `scripts/prerender.mjs` post-build step that serves the built `dist/` folder, crawls every route in headless Chromium, and writes a per-route `index.html` with the fully-rendered DOM. 3. Generated `sitemap.xml`, `robots.txt` (allowing major AI crawlers), and `llms.txt` at the site root. 4. `Organization` JSON-LD on the home page; `BlogPosting` JSON-LD on each blog post (if there's a blog). 5. Optional: Google Tag Manager snippet baked into `index.html` so it's present on every prerendered page. ## Step-by-step ### 1. Install dependencies ```bash pnpm --filter <web-artifact> add -D puppeteer sirv ``` Then in the **root** `package.json`, opt puppeteer's postinstall in (pnpm skips install scripts by default): ```json { "pnpm": { "onlyBuiltDependencies": ["puppeteer"] } } ``` Re-run `pnpm install` to actually download Chromium. ### 2. Install system Chromium (works in BOTH dev and deploy) Puppeteer's bundled Chrome can't launch in the Replit dev container (NixOS, missing libs at standard paths) AND can't launch in the deploy container (too stripped down, missing `libglib`, `libnspr4`, etc.). **Don't fight this** — install Chromium as a system Nix dependency, which is available in both environments because the deploy container inherits the same Nix config: ```js await installSystemDependencies({ packages: ["chromium"] }); ``` This adds `pkgs.chromium` to `replit.nix`. Then in the prerender script (next section), resolve the binary at runtime via `command -v chromium` — this works in both dev and deploy without hard-coding a `/nix/store/...` hash that changes across rebuilds. **Optional belt-and-suspenders fallback:** `pnpm add @sparticuz/chromium` and have the prerender script fall back to it when `command -v chromium` returns nothing. Useful if you ever run this on a non-Replit minimal container (AWS Lambda, Vercel, etc.). ### 3. Add the `useSeo` hook Create `src/lib/site.ts`: ```ts export const SITE_URL = (import.meta.env.VITE_SITE_URL ?? "https://example.com").replace(/\/$/, ""); export const SITE_NAME = "Your Site Name"; export function absoluteUrl(path: string): string { if (/^https?:\/\//i.test(path)) return path; return `${SITE_URL}${path.startsWith("/") ? path : "/" + path}`; } ``` Create `src/lib/seo.ts` with a `useSeo({ title, description, canonicalPath, ogImage, noindex, jsonLd })` hook that manages tags in `document.head` via `useEffect`. Key responsibilities: - Set `<title>` and `<meta name="description">`. - Set `<meta property="og:title|og:description|og:url|og:image|og:type">` and `<meta name="twitter:card|twitter:title|twitter:description|twitter:image">`. - Set `<link rel="canonical">` to `absoluteUrl(canonicalPath)`. - Set `<meta name="robots" content="noindex,nofollow">` when `noindex: true`. - Inject/replace one `<script type="application/ld+json" id="managed-jsonld">` per page. - **Always emit `og:image` / `twitter:image`** — fall back to a site-wide default image (e.g. `/opengraph.jpg`) when the page doesn't supply one. Don't conditionally omit them, or pages without images ship without unfurl previews. - At the end of the effect, signal the prerender that the page is ready: `requestAnimationFrame(() => requestAnimationFrame(() => { document.body.dataset.prerenderReady = "1"; }));` — two rAFs guarantees React has committed and painted. Call `useSeo({...})` from every page component. For 404 / draft routes, pass `noindex: true`. For the home page, pass a JSON-LD `Organization` object. For blog posts, pass a `BlogPosting` with `headline`, `datePublished`, `author`, and `image`. ### 4. Write the prerender script Create `scripts/prerender.mjs`. The skeleton: ```js import sirv from "sirv"; import http from "node:http"; import puppeteer from "puppeteer"; import { writeFileSync, mkdirSync } from "node:fs"; import { dirname, join } from "node:path"; const PORT = Number(process.env.PORT ?? 5000); const DIST = "dist/public"; // wherever vite output is const ROUTES = ["/", "/privacy", "/accessibility", "/blog", ...await getBlogPostPaths()]; const handler = sirv(DIST, { single: true, dev: false }); const server = http.createServer(handler).listen(PORT); // Pick the Chromium that actually launches in this environment. import { execSync } from "node:child_process"; let executablePath = process.env.PUPPETEER_EXECUTABLE_PATH; if (!executablePath) { try { executablePath = execSync("command -v chromium", { encoding: "utf8" }).trim(); } catch { const { default: chromium } = await import("@sparticuz/chromium"); executablePath = await chromium.executablePath(); } } const browser = await puppeteer.launch({ executablePath, args: ["--no-sandbox", "--disable-setuid-sandbox", "--disable-dev-shm-usage"], }); // CRITICAL: snap ALL routes before writing ANY of them. const snaps = []; for (const route of ROUTES) { const page = await browser.newPage(); await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: "networkidle0" }); await page.waitForFunction('document.body.dataset.prerenderReady === "1"', { timeout: 15000 }); const html = await page.content(); await page.close(); snaps.push({ route, html }); console.log(`[prerender] ✓ ${route}`); } for (const { route, html } of snaps) { const out = join(DIST, route === "/" ? "index.html" : `${route}/index.html`); mkdirSync(dirname(out), { recursive: true }); writeFileSync(out, html); } await browser.close(); server.close(); ``` **The snap-then-write order is non-negotiable.** sirv's SPA fallback (`single: true`) serves `index.html` for every unknown route. The moment you overwrite `index.html` with the snapshot of `/`, every subsequent route navigation gets served *that* HTML instead of the real SPA shell — silent corruption, all later snaps come out empty. Always collect every snapshot in memory first, then write them all at the end. ### 5. Generate sitemap.xml, robots.txt, llms.txt In the same `prerender.mjs`, after writing the snaps: - `sitemap.xml`: one `<url>` per route, with `<lastmod>` from the post date for blog posts. Use `absoluteUrl(route)` for the `<loc>`. - `robots.txt`: `User-agent: *` / `Allow: /`, then named entries for `GPTBot`, `ClaudeBot`, `PerplexityBot`, `CCBot`, `Google-Extended` (all `Allow: /`), then `Sitemap: <absolute-url>`. The wildcard already allows them; the named entries make intent explicit and survive future bot-blocking defaults. - `llms.txt`: plain text, one section per topic. Site name + one-line description, then a `## Pages` list of `- [Title](url): description` for home, blog index, and each published post. Keep it under 1500 lines. ### 6. Wire up the build script In the web artifact's `package.json`: ```json { "scripts": { "build": "vite build && node scripts/prerender.mjs" } } ``` Build invocation needs three env vars: ```bash PUPPETEER_EXECUTABLE_PATH=/nix/store/<hash>-chromium-<ver>/bin/chromium \ PORT=5000 \ BASE_PATH=/ \ pnpm --filter <web-artifact> run build ``` ### 7. Add per-route rewrites to `artifact.toml` **This is mandatory for the prerender to actually be visible to crawlers.** Replit's static hosting does NOT auto-resolve `index.html` for directory requests. A request to `/blog/` doesn't match the literal file path `/blog/` (the file is `/blog/index.html`), so without explicit rewrites it falls through to the SPA fallback `[from = "/*", to = "/index.html"]` — and every prerendered URL silently serves the homepage instead of its own HTML. Crawlers see one page; humans see the right page (React routes client-side). You won't notice unless you `curl` the URLs. For every prerendered route, add **two** rewrites (with and without trailing slash) BEFORE the SPA fallback in `.replit-artifact/artifact.toml`: ```toml [[services.production.rewrites]] from = "/blog" to = "/blog/index.html" [[services.production.rewrites]] from = "/blog/" to = "/blog/index.html" # ... one pair per prerendered route ... # SPA fallback — must be LAST. [[services.production.rewrites]] from = "/*" to = "/index.html" ``` Use the artifacts skill's `verifyAndReplaceArtifactToml` to apply the change — don't edit `artifact.toml` in place. **Build-time guard.** Because this list goes stale every time the user adds a blog post, end `prerender.mjs` with a check that reads `artifact.toml` and fails the build with copy-pasteable rewrite blocks if any prerendered route is missing: ```js import { readFileSync } from "node:fs"; const tomlText = readFileSync( join(import.meta.dirname, "..", ".replit-artifact", "artifact.toml"), "utf8", ); const declaredFroms = new Set( [...tomlText.matchAll(/from\s*=\s*"([^"]+)"/g)].map((m) => m[1]), ); const missing = []; for (const route of routes) { if (route === "/") continue; // covered by the SPA fallback const slash = route.endsWith("/") ? route : route + "/"; const noSlash = route.endsWith("/") ? route.slice(0, -1) : route; if (!declaredFroms.has(slash) || !declaredFroms.has(noSlash)) { missing.push({ slash, noSlash }); } } if (missing.length > 0) { console.error(`[prerender] ✗ ${missing.length} route(s) missing rewrites in artifact.toml`); for (const { slash, noSlash } of missing) { const target = `${noSlash}/index.html`; console.error(`\n[[services.production.rewrites]]\nfrom = "${noSlash}"\nto = "${target}"`); console.error(`\n[[services.production.rewrites]]\nfrom = "${slash}"\nto = "${target}"`); } process.exit(1); } ``` When the build fails, the user pastes the new blocks into a temp toml, you call `verifyAndReplaceArtifactToml`, build passes, republish. ### 8. (Optional) Install Google Tag Manager GTM doesn't need React-specific setup — it works because `index.html` is the single template that the prerender bakes into every page. Get the container ID from the user (looks like `GTM-XXXXXXX`), then paste the standard GTM snippet into `artifacts/<web>/index.html`: - The `<script>` block goes as high in `<head>` as possible (right after `<meta name="viewport">`). - The `<noscript><iframe>` block goes immediately after the opening `<body>` tag. Rebuild and verify with `grep -c GTM-XXXXXXX dist/public/index.html dist/public/blog/index.html` — should report at least 2 hits per file (script + noscript). ## Verification checklist After running the build, spot-check the **raw HTML** (not the rendered page): ```bash # Real content, not empty <div id="root"> grep -c "<h1" dist/public/index.html grep -c "BlogPosting" dist/public/blog/<a-slug>/index.html grep -c "og:image" dist/public/privacy/index.html # Generated files exist ls dist/public/{sitemap.xml,robots.txt,llms.txt} # Per-page titles differ grep "<title>" dist/public/index.html dist/public/blog/index.html dist/public/privacy/index.html ``` ## Common pitfalls - **Prerendered files exist but every URL serves the homepage** — the SPA-fallback rewrite (`/* → /index.html`) is shadowing your directory-style URLs. Replit static hosting doesn't auto-resolve `index.html` for `/foo/` requests. Add explicit rewrites per route BEFORE the SPA fallback. See step 7. This is the silent-failure mode that *only* shows up when you `curl` the live site — browsers look fine because React takes over and routes client-side. - **You shipped the fix but an LLM/evaluator still says the site is empty** — almost always cache poisoning, not your bug. LLM web-fetching tools (ChatGPT, Claude, Perplexity) and third-party site evaluators cache responses aggressively, often across conversations. After a deploy, **`curl` is the source of truth** — if curl shows the right HTML, the site is correct. To verify with an AI assistant: (1) ask a *different* model than the one that gave the stale verdict (independent caches), (2) add a junk query string like `?v=2` to force a fresh fetch on cached evaluators, (3) wait — a clean fix can look broken to chatbots for hours before their cache rolls over. If two different models agree the site is empty, *then* investigate; a disagreement between models is almost always a cache disagreement. - **Bundled Chromium fails on NixOS AND in the deploy container** — different missing libs in each. Install system `chromium` via `installSystemDependencies` (lands in `replit.nix`, picked up by both environments) and resolve it at runtime via `command -v chromium`. See step 2. - **Snap order corrupts SPA fallback** — see step 4. - **`og:image` missing on pages without a per-page image** — always fall back to a site-wide default in `useSeo`. - **Forgetting `BASE_PATH=/`** — Vite's `base` defaults to `/`, but if the artifact normally serves from a subpath in dev, the prerender will produce URLs with the subpath baked in. Set `BASE_PATH=/` explicitly for production builds. - **Blocking CSS/JS in robots.txt** — never do this. Google needs to fetch your stylesheet to score mobile-friendliness. Blocking it tanks rankings. Allow everything; use `noindex` on pages you actually want hidden. - **`robots.txt` is not a privacy tool** — it controls crawling, not indexing. A page disallowed in robots can still appear in search results if linked from elsewhere. Use `<meta name="robots" content="noindex">` to actually keep a URL out. ## Reference implementation The full working setup lives in `artifacts/kathleen-building/` of this project: - `scripts/prerender.mjs`, `scripts/blog-source.mjs` - `src/lib/{site.ts,seo.ts}` - `src/pages/HomePage.tsx` (Organization JSON-LD), `src/pages/BlogPost.tsx` (BlogPosting JSON-LD) - `index.html` (GTM snippet) - `replit.md` "Prerender (SEO + AI visibility)" section