Technical Documentation · 03

Data flow, start to finish

One vertical read of how a piece of content becomes structured data an AI can trust — split into build time (what the admin does) and request time (what a visitor or crawler triggers).

WordPress plugin Python API (geoapipy.com) Database · wp_gd78_GEOData Visitor / AI crawler

Build time · WordPress admin

Index the site

Admin picks post types and clicks Populate. Posts are read in batches of 20 and written as rows.

ajax_init_populationajax_process_batchGD78_DB::index_post()

one row per post

▤ wp_gd78_GEOData

url · slug · post_type · template · title · post_meta (whitelisted JSON, incl. resolved image_url) · classification · json_ld · schema_last_updated

Classify each row

Admin assigns a schema.org type (Recipe, Product, Article…) on the Classification screen or post meta box.

ajax_save_classificationSchemaTypes.json · ~75 types

changing a type clears stale JSON-LD

Request generation

From the Schema Writer (one row or "generate all"), the plugin sends { url, classification, metadata } to the API.

ajax_get_and_save_json_ld → POST /GetJsonLd

HTTPS · 60s timeout

⟶ INSIDE THE API · geoapipy.com · stateless FastAPI

Fetch

Pull the page over httpx & verify it's HTML.

fetch_html()

Extract

Trafilatura isolates main text; off-domain links harvested.

extract_main_content()

Prompt

Load the type's property whitelist; assemble 10 hard rules.

build_prompt()

Generate

Call models in order; validate a JSON object returns.

generate_json_ld()

failover gemini-3.1-flash-lite → gpt-oss-20b → gpt-5.4-nano · all fail → 502

raw JSON-LD body

Validate & store

Plugin confirms the body is real JSON (never an HTML error page) and writes it with a fresh timestamp.

GD78_DB::update_json_ld() → column json_ld + schema_last_updated

Request time · public page load

Visitor or AI crawler requests the page

A browser, Googlebot, or an AI answer engine (ChatGPT, Perplexity, Gemini…) loads example.com/the-post.

singular front-end view

WordPress fires wp_head

Inject the stored schema

The injector reads the row, re-validates the JSON, applies the </→<\/ guard, and prints the tag. No API call — a single cheap DB read.

GD78_Public::output_json_ld() on wp_head

<script type="application/ld+json">

Machine-readable structured data now lives in the page <head> — exactly what search engines and LLMs parse to understand and cite the page.

The loop, in one line. Admin classifies → API generates once → result is cached in the database → every page load (human or AI) is served instantly from that cache, with no per-request model cost.

← Read the narrative overview · Look up any function →