Technical Documentation · 02
Function reference
Every function and method across both projects, grouped by file, with a one-line note on what it does and how it fits the system. Signatures are taken directly from the source.
Part A · GEOData WordPress plugin
C:\_DEV\Claude\GEOJson2\GEODataPlugin\ — PHP. All identifiers use the gd78_ prefix.
Bootstrap. Defines GD78_VERSION (1.7.0), GD78_DB_VERSION (1.5.0), and the plugin dir/URL constants; registers the activation/deactivation hooks; loads every class and calls GD78_Plugin::run().
__construct()
Instantiates the loader and wires up both admin and public hook sets.
define_admin_hooks()
Registers the admin menu, asset enqueuing, every AJAX endpoint, the post meta box, and the CSV export handlers.
define_public_hooks()
Hooks GD78_Public::output_json_ld() onto wp_head so schema is emitted on the front end.
run()
Tells the loader to commit all queued hook registrations to WordPress.
add_action($hook, $component, $callback, $priority, $accepted_args)
Queues a WordPress action registration.
add_filter($hook, $component, $callback, $priority, $accepted_args)
Queues a WordPress filter registration.
run()
Iterates the queues and registers everything with add_action/add_filter. A tiny indirection layer that keeps hook wiring in one place.
activate()
Runs on plugin activation: creates the table and records the DB version.
maybe_upgrade()
Runs on every admin load; detects schema drift and fires the right migration.
create_table()
Idempotently creates/updates wp_gd78_GEOData with dbDelta().
backfill_post_meta()
Migration 1.1.0 — populates the new post_meta column for rows that predate it.
renarrow_post_meta()
Migration 1.4.0 — re-applies the current meta whitelist and injects the synthetic image_url.
deactivate()
Deactivation hook. Intentionally a no-op for data — the table and all generated JSON-LD are preserved.
get_table_name() : string
Returns the prefixed table name wp_gd78_GEOData.
get_meta_whitelist() : array
The allow-list of post-meta keys worth capturing (Yoast, Rank Math, featured image, page template, …).
extract_relevant_meta($post_id)
Filters a post's meta down to the whitelist and injects the synthetic image_url. Produces the JSON that becomes the API's metadata.
resolve_image_url($post_id)
Three-tier image fallback: featured image → SEO social image → first real content image.
get_seo_image_url($post_id) private
Reads Yoast / Rank Math social-image meta keys.
extract_first_content_image($post_id) private
Regex-scans post_content for the first genuine <img>, skipping logos/icons.
is_chrome_image_src($src, $logo_url, $icon_url) private
Heuristic that rejects site-chrome images (logo, favicon, social icons).
get_all_records($args = [])
Fetches all rows with optional order-by, direction, and keyword search. Powers the admin tables.
upsert_record($data)
Inserts or updates a row (REPLACE semantics).
index_post($post_id)
Builds a full row from a WordPress post and upserts it. The unit of work for Phase 1 population.
delete_record($post_id)
Removes a single row by post ID.
count_posts_for_population($post_types)
Counts publishable posts of the chosen types — drives the progress bar total.
get_posts_for_population($post_types, $offset, $batch_size)
Returns one batch of posts for the population loop.
save_classification($post_id, $classification)
Cache gatekeeper. Updates the classification and, if it changed while JSON-LD exists, clears json_ld + schema_last_updated to force regeneration.
update_notes($post_id, $notes)
Writes the notes column — used to store the llm_prompt returned by the API Tester.
update_json_ld($post_id, $json_ld)
Persists generated JSON-LD and stamps schema_last_updated; returns the timestamp.
get_record($post_id)
Fetches one row by post ID. Used by both the admin AJAX paths and the public injector.
get_classified_records()
All rows that have a non-empty classification — the working set for the API Tester & Schema Writer.
Menus & assets
add_plugin_admin_menu()
Creates the GEO Index menu and its sub-screens (Classification, API Tester, Schema Writer).
enqueue_styles($hook) · enqueue_scripts($hook)
Load the main screen's CSS/JS, passing the AJAX URL and nonce to the front end.
display_geo_index_page() · display_classification_page() · display_api_tester_page() · display_schema_writer_page()
Render the four admin screen templates from admin/partials/.
enqueue_classification_styles/scripts · enqueue_api_tester_assets · enqueue_schema_writer_assets ($hook)
Conditionally enqueue each screen's assets only on its own page.
Post meta box
register_meta_box() · render_meta_box($post) · enqueue_meta_box_assets($hook)
Adds the per-post GEO Index box to the editor for classifying and generating one post at a time.
AJAX — population
ajax_init_population()
Validates the chosen post types, counts the total, and stores a 30-minute session transient.
ajax_process_batch()
Indexes the next 20 posts via GD78_DB::index_post() and reports progress.
AJAX — classification
ajax_save_classification()
Saves one record's type and returns flags (e.g. JSON-LD was cleared).
ajax_apply_group_classification()
Bulk-applies a type to every record in a group.
AJAX — API integration
ajax_call_geoapipy()
POSTs to /LLM_Prompt_Tester (30s), extracts llm_prompt, and stores it in notes.
ajax_get_and_save_json_ld()
POSTs to /GetJsonLd (60s), validates the body is real JSON, and stores it via update_json_ld().
ajax_delete_record()
Deletes one record by ID.
CSV export
handle_export_csv()
admin-post handler — streams the whole table (id, url, classification, title, post_meta, notes).
handle_export_prompts()
Exports the notes column for Recipe-classified rows only.
stream_csv($filename, $columns, $rows) private
Emits Excel-safe CSV (UTF-8 BOM, CRLF).
fputcsv_excel($handle, $fields) private
fputcsv wrapper forcing CRLF line endings.
sanitize_csv_cell($value) private static
Coerces UTF-8, strips control chars, and neutralises formula-injection (= + - @).
output_json_ld()
Hooked to wp_head. On singular front-end views, fetches the stored JSON-LD, validates it parses, applies the </→<\/ guard, and prints the <script type="application/ld+json"> tag. Bails silently on any failure.
clean_json_ld($json_ld) private
Trims whitespace and defensively strips a stray Markdown code fence before output.
Part B · GetJsonLd FastAPI service
C:\_DEV\Claude\GEOPythonAPI\ — Python 3 / FastAPI. Stateless.
healthz() → dict
GET /healthz — returns {"status":"ok"} for monitoring.
async get_json_ld(req: GetJsonLdRequest) → dict
POST /GetJsonLd — orchestrates extract → prompt → LLM and returns the JSON-LD. Maps failures to 422 / 500 / 502.
async llm_prompt_tester(req: GetJsonLdRequest) → dict
POST /LLM_Prompt_Tester — same pipeline minus the model call; returns the extracted content, action links, and assembled llm_prompt for inspection.
class GetJsonLdRequest(BaseModel)
Pydantic request schema: url: HttpUrl, classification: str, metadata: Optional[Dict]. Validates input before any work runs.
fetch_html(url, *, timeout=20.0) → str
httpx GET with a browser UA, following redirects; raises ExtractionError on failure or non-HTML.
extract_main_content(html, url) → str
Trafilatura in precision mode (tables kept, links kept, ≥150 chars) to isolate the readable body.
_normalize_host(host) → str
Strips www. so internal vs. external links compare correctly.
extract_action_links(html, page_url) → list[dict]
Drops nav/footer/aside/banner, then collects up to 50 unique off-domain anchors so the LLM has real URLs for URL-typed properties.
fetch_and_extract(url) → (str, list[dict])
Convenience orchestrator chaining the three above.
class ExtractionError(RuntimeError)
Raised when a page can't be fetched or no main content survives — surfaces as HTTP 422.
load_allowed_properties(classification) → list[str]
Loads schemadata/fields/{Type}.json — the property whitelist for that schema.org type; raises UnknownClassificationError if missing.
_truncate(text, limit=12000) → str
Caps extracted content at 12k chars with a truncation marker.
_render_action_links(action_links) → str
Formats the off-domain links into a prompt block with usage guidance.
build_prompt(*, url, classification, metadata, extracted_content, action_links) → str
Assembles the final prompt: 10 hard SYSTEM_RULES, the allowed-property whitelist, the URL/type/metadata/content, the links block, and the task instruction.
SYSTEM_RULES const
The 10 rules that enforce valid JSON-LD: object-only output, required @context/@type, strict root-property whitelist, no fabricated facts, ISO-8601 dates, metadata is authoritative.
class UnknownClassificationError(ValueError)
Raised when a classification has no whitelist file — surfaces as HTTP 422.
_strip_fences(text) → str
Removes ```/```json fences a model may wrap around its output.
async _call_model(client, model, prompt) → dict
One OpenRouter chat completion: sets auth + referer + title headers, posts the prompt, validates a JSON object comes back, or raises ModelCallError.
async generate_json_ld(prompt) → dict
Walks MODEL_FAILOVER_ORDER, returning the first success; raises LLMRoutingError with every model's reason if all fail.
class ModelCallError · class LLMRoutingError
Single-model failure (triggers failover) vs. all-models-failed (carries the failures list → HTTP 502).
OPENROUTER_* · MODEL_FAILOVER_ORDER
OpenRouter endpoint, key, referer/title headers, the 45s timeout, and the ordered model list (gemini-3.1-flash-lite → gpt-oss-20b → gpt-5.4-nano).
main() (build_schema_fields)
Downloads the latest schema.org vocabulary and writes one property-whitelist file per target type into schemadata/fields/.
download_vocab() · build_indexes(vocab) · ancestors(type, parents) · fields_for_type(type, …)
Parse the vocabulary, map parent/child types, and collect every property valid on a type including inherited ones.
strip_prefix() · as_id_list() · as_type_list() · load_target_types()
JSON-LD helpers that normalise schema.org node IDs/types and read the target type list from SchemaTypes.json.
main() (test_extractor)
Regression harness: fetches a known page and asserts the extractor still finds the expected action link.