GEO Index Docs

Technical Documentation · 02

Function reference

Every function and method across both projects, grouped by file, with a one-line note on what it does and how it fits the system. Signatures are taken directly from the source.

Part A · GEOData WordPress plugin

C:\_DEV\Claude\GEOJson2\GEODataPlugin\ — PHP. All identifiers use the gd78_ prefix.

entry geodata.php

Bootstrap. Defines GD78_VERSION (1.7.0), GD78_DB_VERSION (1.5.0), and the plugin dir/URL constants; registers the activation/deactivation hooks; loads every class and calls GD78_Plugin::run().

class includes/class-gd78-plugin.php · GD78_Plugin

__construct()

Instantiates the loader and wires up both admin and public hook sets.

define_admin_hooks()

Registers the admin menu, asset enqueuing, every AJAX endpoint, the post meta box, and the CSV export handlers.

define_public_hooks()

Hooks GD78_Public::output_json_ld() onto wp_head so schema is emitted on the front end.

run()

Tells the loader to commit all queued hook registrations to WordPress.

class includes/class-gd78-loader.php · GD78_Loader

add_action($hook, $component, $callback, $priority, $accepted_args)

Queues a WordPress action registration.

add_filter($hook, $component, $callback, $priority, $accepted_args)

Queues a WordPress filter registration.

run()

Iterates the queues and registers everything with add_action/add_filter. A tiny indirection layer that keeps hook wiring in one place.

class includes/class-gd78-activator.php · GD78_Activator

activate()

Runs on plugin activation: creates the table and records the DB version.

maybe_upgrade()

Runs on every admin load; detects schema drift and fires the right migration.

create_table()

Idempotently creates/updates wp_gd78_GEOData with dbDelta().

backfill_post_meta()

Migration 1.1.0 — populates the new post_meta column for rows that predate it.

renarrow_post_meta()

Migration 1.4.0 — re-applies the current meta whitelist and injects the synthetic image_url.

class includes/class-gd78-deactivator.php · GD78_Deactivator

deactivate()

Deactivation hook. Intentionally a no-op for data — the table and all generated JSON-LD are preserved.

class includes/class-gd78-db.php · GD78_DB (data layer)

get_table_name() : string

Returns the prefixed table name wp_gd78_GEOData.

get_meta_whitelist() : array

The allow-list of post-meta keys worth capturing (Yoast, Rank Math, featured image, page template, …).

extract_relevant_meta($post_id)

Filters a post's meta down to the whitelist and injects the synthetic image_url. Produces the JSON that becomes the API's metadata.

resolve_image_url($post_id)

Three-tier image fallback: featured image → SEO social image → first real content image.

get_seo_image_url($post_id) private

Reads Yoast / Rank Math social-image meta keys.

extract_first_content_image($post_id) private

Regex-scans post_content for the first genuine <img>, skipping logos/icons.

is_chrome_image_src($src, $logo_url, $icon_url) private

Heuristic that rejects site-chrome images (logo, favicon, social icons).

get_all_records($args = [])

Fetches all rows with optional order-by, direction, and keyword search. Powers the admin tables.

upsert_record($data)

Inserts or updates a row (REPLACE semantics).

index_post($post_id)

Builds a full row from a WordPress post and upserts it. The unit of work for Phase 1 population.

delete_record($post_id)

Removes a single row by post ID.

count_posts_for_population($post_types)

Counts publishable posts of the chosen types — drives the progress bar total.

get_posts_for_population($post_types, $offset, $batch_size)

Returns one batch of posts for the population loop.

save_classification($post_id, $classification)

Cache gatekeeper. Updates the classification and, if it changed while JSON-LD exists, clears json_ld + schema_last_updated to force regeneration.

update_notes($post_id, $notes)

Writes the notes column — used to store the llm_prompt returned by the API Tester.

update_json_ld($post_id, $json_ld)

Persists generated JSON-LD and stamps schema_last_updated; returns the timestamp.

get_record($post_id)

Fetches one row by post ID. Used by both the admin AJAX paths and the public injector.

get_classified_records()

All rows that have a non-empty classification — the working set for the API Tester & Schema Writer.

class admin/class-gd78-admin.php · GD78_Admin (controllers + AJAX)

Menus & assets

add_plugin_admin_menu()

Creates the GEO Index menu and its sub-screens (Classification, API Tester, Schema Writer).

enqueue_styles($hook) · enqueue_scripts($hook)

Load the main screen's CSS/JS, passing the AJAX URL and nonce to the front end.

display_geo_index_page() · display_classification_page() · display_api_tester_page() · display_schema_writer_page()

Render the four admin screen templates from admin/partials/.

enqueue_classification_styles/scripts · enqueue_api_tester_assets · enqueue_schema_writer_assets ($hook)

Conditionally enqueue each screen's assets only on its own page.

Post meta box

register_meta_box() · render_meta_box($post) · enqueue_meta_box_assets($hook)

Adds the per-post GEO Index box to the editor for classifying and generating one post at a time.

AJAX — population

ajax_init_population()

Validates the chosen post types, counts the total, and stores a 30-minute session transient.

ajax_process_batch()

Indexes the next 20 posts via GD78_DB::index_post() and reports progress.

AJAX — classification

ajax_save_classification()

Saves one record's type and returns flags (e.g. JSON-LD was cleared).

ajax_apply_group_classification()

Bulk-applies a type to every record in a group.

AJAX — API integration

ajax_call_geoapipy()

POSTs to /LLM_Prompt_Tester (30s), extracts llm_prompt, and stores it in notes.

ajax_get_and_save_json_ld()

POSTs to /GetJsonLd (60s), validates the body is real JSON, and stores it via update_json_ld().

ajax_delete_record()

Deletes one record by ID.

CSV export

handle_export_csv()

admin-post handler — streams the whole table (id, url, classification, title, post_meta, notes).

handle_export_prompts()

Exports the notes column for Recipe-classified rows only.

stream_csv($filename, $columns, $rows) private

Emits Excel-safe CSV (UTF-8 BOM, CRLF).

fputcsv_excel($handle, $fields) private

fputcsv wrapper forcing CRLF line endings.

sanitize_csv_cell($value) private static

Coerces UTF-8, strips control chars, and neutralises formula-injection (= + - @).

class includes/class-gd78-public.php · GD78_Public (front end)

output_json_ld()

Hooked to wp_head. On singular front-end views, fetches the stored JSON-LD, validates it parses, applies the </<\/ guard, and prints the <script type="application/ld+json"> tag. Bails silently on any failure.

clean_json_ld($json_ld) private

Trims whitespace and defensively strips a stray Markdown code fence before output.


Part B · GetJsonLd FastAPI service

C:\_DEV\Claude\GEOPythonAPI\ — Python 3 / FastAPI. Stateless.

routes app/main.py

healthz() → dict

GET /healthz — returns {"status":"ok"} for monitoring.

async get_json_ld(req: GetJsonLdRequest) → dict

POST /GetJsonLd — orchestrates extract → prompt → LLM and returns the JSON-LD. Maps failures to 422 / 500 / 502.

async llm_prompt_tester(req: GetJsonLdRequest) → dict

POST /LLM_Prompt_Tester — same pipeline minus the model call; returns the extracted content, action links, and assembled llm_prompt for inspection.

model app/models.py

class GetJsonLdRequest(BaseModel)

Pydantic request schema: url: HttpUrl, classification: str, metadata: Optional[Dict]. Validates input before any work runs.

module app/extractor.py

fetch_html(url, *, timeout=20.0) → str

httpx GET with a browser UA, following redirects; raises ExtractionError on failure or non-HTML.

extract_main_content(html, url) → str

Trafilatura in precision mode (tables kept, links kept, ≥150 chars) to isolate the readable body.

_normalize_host(host) → str

Strips www. so internal vs. external links compare correctly.

extract_action_links(html, page_url) → list[dict]

Drops nav/footer/aside/banner, then collects up to 50 unique off-domain anchors so the LLM has real URLs for URL-typed properties.

fetch_and_extract(url) → (str, list[dict])

Convenience orchestrator chaining the three above.

class ExtractionError(RuntimeError)

Raised when a page can't be fetched or no main content survives — surfaces as HTTP 422.

module app/prompt_builder.py

load_allowed_properties(classification) → list[str]

Loads schemadata/fields/{Type}.json — the property whitelist for that schema.org type; raises UnknownClassificationError if missing.

_truncate(text, limit=12000) → str

Caps extracted content at 12k chars with a truncation marker.

_render_action_links(action_links) → str

Formats the off-domain links into a prompt block with usage guidance.

build_prompt(*, url, classification, metadata, extracted_content, action_links) → str

Assembles the final prompt: 10 hard SYSTEM_RULES, the allowed-property whitelist, the URL/type/metadata/content, the links block, and the task instruction.

SYSTEM_RULES const

The 10 rules that enforce valid JSON-LD: object-only output, required @context/@type, strict root-property whitelist, no fabricated facts, ISO-8601 dates, metadata is authoritative.

class UnknownClassificationError(ValueError)

Raised when a classification has no whitelist file — surfaces as HTTP 422.

module app/llm_client.py

_strip_fences(text) → str

Removes ```/```json fences a model may wrap around its output.

async _call_model(client, model, prompt) → dict

One OpenRouter chat completion: sets auth + referer + title headers, posts the prompt, validates a JSON object comes back, or raises ModelCallError.

async generate_json_ld(prompt) → dict

Walks MODEL_FAILOVER_ORDER, returning the first success; raises LLMRoutingError with every model's reason if all fail.

class ModelCallError · class LLMRoutingError

Single-model failure (triggers failover) vs. all-models-failed (carries the failures list → HTTP 502).

config app/config.py

OPENROUTER_* · MODEL_FAILOVER_ORDER

OpenRouter endpoint, key, referer/title headers, the 45s timeout, and the ordered model list (gemini-3.1-flash-litegpt-oss-20bgpt-5.4-nano).

scripts scripts/build_schema_fields.py & scripts/test_extractor.py

main() (build_schema_fields)

Downloads the latest schema.org vocabulary and writes one property-whitelist file per target type into schemadata/fields/.

download_vocab() · build_indexes(vocab) · ancestors(type, parents) · fields_for_type(type, …)

Parse the vocabulary, map parent/child types, and collect every property valid on a type including inherited ones.

strip_prefix() · as_id_list() · as_type_list() · load_target_types()

JSON-LD helpers that normalise schema.org node IDs/types and read the target type list from SchemaTypes.json.

main() (test_extractor)

Regression harness: fetches a known page and asserts the extractor still finds the expected action link.

→ See how these functions connect in the data-flow diagram