GEO Index Docs

Technical Documentation · 03

Data flow, start to finish

One vertical read of how a piece of content becomes structured data an AI can trust — split into build time (what the admin does) and request time (what a visitor or crawler triggers).

WordPress plugin Python API (geoapipy.com) Database · wp_gd78_GEOData Visitor / AI crawler
Build time · WordPress admin
1

Index the site

Admin picks post types and clicks Populate. Posts are read in batches of 20 and written as rows.

ajax_init_populationajax_process_batchGD78_DB::index_post()
one row per post

▤ wp_gd78_GEOData

url · slug · post_type · template · title · post_meta (whitelisted JSON, incl. resolved image_url) · classification · json_ld · schema_last_updated

2

Classify each row

Admin assigns a schema.org type (Recipe, Product, Article…) on the Classification screen or post meta box.

ajax_save_classificationSchemaTypes.json · ~75 types
changing a type clears stale JSON-LD
3

Request generation

From the Schema Writer (one row or "generate all"), the plugin sends { url, classification, metadata } to the API.

ajax_get_and_save_json_ldPOST /GetJsonLd
HTTPS · 60s timeout

⟶ INSIDE THE API  ·  geoapipy.com  ·  stateless FastAPI

A
Fetch

Pull the page over httpx & verify it's HTML.

fetch_html()
B
Extract

Trafilatura isolates main text; off-domain links harvested.

extract_main_content()
C
Prompt

Load the type's property whitelist; assemble 10 hard rules.

build_prompt()
D
Generate

Call models in order; validate a JSON object returns.

generate_json_ld()

failover  gemini-3.1-flash-litegpt-oss-20bgpt-5.4-nano  ·  all fail → 502

raw JSON-LD body
4

Validate & store

Plugin confirms the body is real JSON (never an HTML error page) and writes it with a fresh timestamp.

GD78_DB::update_json_ld() → column json_ld + schema_last_updated
Request time · public page load
5

Visitor or AI crawler requests the page

A browser, Googlebot, or an AI answer engine (ChatGPT, Perplexity, Gemini…) loads example.com/the-post.

singular front-end view
WordPress fires wp_head
6

Inject the stored schema

The injector reads the row, re-validates the JSON, applies the </<\/ guard, and prints the tag. No API call — a single cheap DB read.

GD78_Public::output_json_ld() on wp_head

<script type="application/ld+json">

Machine-readable structured data now lives in the page <head> — exactly what search engines and LLMs parse to understand and cite the page.

The loop, in one line. Admin classifies → API generates once → result is cached in the database → every page load (human or AI) is served instantly from that cache, with no per-request model cost.

← Read the narrative overview  ·  Look up any function →