Technical Documentation · 03
Data flow, start to finish
One vertical read of how a piece of content becomes structured data an AI can trust — split into build time (what the admin does) and request time (what a visitor or crawler triggers).
Index the site
Admin picks post types and clicks Populate. Posts are read in batches of 20 and written as rows.
▤ wp_gd78_GEOData
url · slug · post_type · template · title · post_meta (whitelisted JSON, incl. resolved image_url) · classification · json_ld · schema_last_updated
Classify each row
Admin assigns a schema.org type (Recipe, Product, Article…) on the Classification screen or post meta box.
Request generation
From the Schema Writer (one row or "generate all"), the plugin sends { url, classification, metadata } to the API.
⟶ INSIDE THE API · geoapipy.com · stateless FastAPI
Fetch
Pull the page over httpx & verify it's HTML.
fetch_html()Extract
Trafilatura isolates main text; off-domain links harvested.
extract_main_content()Prompt
Load the type's property whitelist; assemble 10 hard rules.
build_prompt()Generate
Call models in order; validate a JSON object returns.
generate_json_ld()failover gemini-3.1-flash-lite → gpt-oss-20b → gpt-5.4-nano · all fail → 502
Validate & store
Plugin confirms the body is real JSON (never an HTML error page) and writes it with a fresh timestamp.
Visitor or AI crawler requests the page
A browser, Googlebot, or an AI answer engine (ChatGPT, Perplexity, Gemini…) loads example.com/the-post.
Inject the stored schema
The injector reads the row, re-validates the JSON, applies the </→<\/ guard, and prints the tag. No API call — a single cheap DB read.
<script type="application/ld+json">
Machine-readable structured data now lives in the page <head> — exactly what search engines and LLMs parse to understand and cite the page.