Case study · Live demonstrator

A living guide to Siem Reap, run by a machine.

Angkor Now is a tourism & culture guide for Cambodia’s temple town: events, venues, festivals, maps and guides. Behind its quiet editorial surface sits an autonomous ingestion pipeline that reads the web, extracts structure with an LLM, de-duplicates, and stages everything for a human before it ever goes live.

Type
City guide · SvelteKit
Backend
LLM intake pipeline
Status
Live · prerendered
176 venues mapped
AI-ingested · human-approved
angkornow.com
Angkor Now homepage
The brief

One town, a hundred sources, no staff.

Siem Reap’s nightlife, events and venue scene lives scattered across Facebook pages, partner listings and word-of-mouth. No single source is complete, none are structured, and most are out of date within a week.

Angkor Now’s premise: a city guide that maintains itself. The site continuously reads those messy sources, uses a language model to pull clean structured records out of them, and routes everything through automated de-duplication and a single human approval gate before publishing, so the public site stays trustworthy while the grind stays automated.

The result is a fast, warm, editorial product that reads like it has a full newsroom behind it, built and sponsored by AI Boffin Hub as a functional demonstrator of exactly that idea.

“The scraper never touches the database. It writes JSON. A human decides what becomes real.”
176
Venues mapped & geocoded
100%
Pages prerendered & static
2
Dedup algorithms · exact + fuzzy
0
DB writes from the scraper
1
Human review gate before publish
The product

A guide that feels hand-edited.

Cormorant Garamond headlines, warm paper, a forest-and-turmeric palette and theme switching: every surface is editorial first, database second. A tour of what a visitor actually sees.

angkornow.com/tonight
Tonight page
01 / What's on now

“What’s on tonight.”

The default question a traveller asks, answered first. A timezone-correct timeline of tonight’s events, all-day then by start time, with category and price tags inline, rolling straight into “coming up next” for tomorrow.

Asia/Phnom_Penh awareLive timelineTag & price badges
angkornow.com/events
Events listing
02 / The events feed

Every event, filterable.

The full directory of upcoming events with faceted filtering by category, date and venue. Each record is an AI-extracted, human-approved entry: recurring nights, ticketed shows and free meetups all normalised into one schema.

Faceted filtersRecurring supportPast-event records
angkornow.com/calendar
Calendar view
03 / Plan ahead

A month at a glance.

The same event corpus, pivoted into a monthly calendar for trip planning. Days light up with what’s happening; click through to the day or the event. Built from the identical prerendered data, with no second source of truth.

Month gridSingle data sourceDeep links
angkornow.com/venues
Venues directory
04 / The directory

Bars, restaurants, culture.

A browsable venue directory across bars, restaurants, cafés, music and entertainment venues. Each has its own prerendered detail page, opening hours, neighbourhood and a generated QR code for sharing on the ground.

Per-venue pagesNeighbourhoodsQR share codes
angkornow.com/map
Interactive clustered map
05 / Find a venue

An interactive map, clustered.

All 176 venues plotted on a MapLibre GL map over CARTO tiles, with marker clustering, category-coloured pins, an “Open Now” filter and a neighbourhoods overlay. The side rail and map stay in sync as you filter and search.

MapLibre GLMarker clusteringOpen-now filter
angkornow.com/guides
Editorial guides
06 / Editorial guides

Long-form, curated.

Hand-shaped guides (the best rooftop bars, apsara dance, getting around, family-friendly days, healthcare & emergencies) that wrap the structured data in human context. This is where the machine-run guide still reads like a person wrote it.

Topic guidesFamily-friendlyHealthcare & safety
angkornow.com/search
Search across the directory
07 / Find anything

“Find anything.”

One search box across venues, events and guides, with suggestion chips for common intents: cocktail bar, apsara dance, kandal village, festival. Below it, a persistent emergency band keeps the Royal Angkor Hospital line one glance away.

Unified searchIntent chipsEmergency band
Under the surface

Advanced features, quietly done.

The polish a traveller never notices but always benefits from: correctness, speed and resilience baked into the architecture.

Timezone-correct everything

All dates normalise to Asia/Phnom_Penh at intake, so “tonight” means tonight in Siem Reap regardless of where the visitor or the server sits.

Normalise stage

Dual-algorithm dedup

An exact key match catches re-scrapes; a fuzzy title-date-venue similarity score catches the same event described two different ways across sources.

Insert · merge · stage

Marker clustering at scale

MapLibre GL clusters hundreds of geocoded pins into readable groups, recolouring and recounting live as you zoom, pan, filter by category or toggle open-now.

MapLibre · CARTO

Fully prerendered

SvelteKit prerenders every route to static HTML on deploy: instant first paint, no cold database calls in the request path, trivially cacheable on a CDN.

SSG · Netlify CDN

Theme & dark mode

A three-accent theme system (forest, turmeric, oxblood) plus light/dark, all driven by CSS custom properties set in one token file. No hardcoded brand colours anywhere.

CSS custom properties

QR & sharing built in

Venue pages generate QR codes server-side for on-the-ground sharing, with clean Open Graph metadata so any link unfurls into a proper card.

Server-generated
The AI backend

How the guide reads the web for itself.

A seven-stage intake pipeline turns messy public sources into clean, de-duplicated, human-approved records. Every stage is isolated, and the one irreversible action, publishing, always needs a person.

01
Source fetch
fetch-source.ts · pluggable fetchers
Each source (partner listings, Facebook pages via Apify actors, curated feeds) has its own fetcher returning a common RawEvent[] shape. New sources plug in without touching the rest of the pipeline.
02
Read-only scrape
scrape-events.ts · JSON out, zero writes
The scraper is deliberately powerless: it only writes a JSON file and never touches the database. Scraping and persistence are fully decoupled, so a bad scrape can never corrupt live data.🔒 Hard isolation boundary
03
LLM extraction
llm.ts · OpenRouter
Raw, unstructured text and HTML is passed to a language model via OpenRouter, which returns structured fields (title, start & end time, venue, price, category and tags) out of prose a regex never could.
04
Normalise & filter
normalise.ts · timezone · blocklist
Extracted records are coerced into the canonical events schema: times shifted to Asia/Phnom_Penh, slugs generated, categories mapped, and a blocklist drops spam and non-events before they go further.
05
Dual-algorithm dedup
dedup-check.ts · exact + fuzzy
Two layers decide each record’s fate: an exact match on source identity, then a fuzzy similarity score across title, date and venue. The output is a decision: insert, merge or stage.
06
Human review gate
admin/pending · staging
Anything uncertain lands in a staging queue where an editor approves, merges or rejects it. Nothing the machine is unsure about reaches the public site unsupervised.✋ Required human approval
07
Enrich, persist & ship
Apify · Supabase · Netlify
Approved records are enriched (images via Apify, venue geocoding), written to Supabase Postgres, then the site re-prerenders and deploys to Netlify’s CDN, back to a fast static guide.
scripts/intake/run-intake.ts — the orchestrator
// fetch → normalise → dedup-check → (insert | merge | stage)
const raw      = await fetchSource(source);        // RawEvent[]
const events   = raw.map(normalise);            // → events schema, TZ-correct
const clean    = events.filter(notBlocked);    // blocklist

for (const ev of clean) {
  const verdict = await dedupCheck(ev);    // exact + fuzzy

  if (verdict.kind === 'duplicate') skip(ev);
  else if (verdict.kind === 'merge')  mergeInto(verdict.target, ev);
  else                                stageForReview(ev); // ← human gate
}
// the scraper itself? it only ever wrote a .json file.
Isolated by design
Scrape, extract, dedup and publish are separate stages with hard boundaries, so failures stay contained instead of cascading into live data.
Human-in-the-loop
The model proposes; a person disposes. Automation handles the volume; the single approval gate handles the trust.
Model-agnostic
Extraction runs through OpenRouter, so the underlying model can be swapped without rewriting the pipeline around it.
The stack

Built to stay cheap and fast.

A deliberately lean, mostly-static architecture: sophisticated where it counts, boring everywhere else.

SvelteKit
Framework · SSG
Every route prerendered to static HTML for instant loads.
Supabase
Database
Postgres of record for approved events, venues and guides.
Netlify
Hosting · CDN
Static deploys served globally, rebuilt on new data.
OpenRouter
LLM gateway
Model-agnostic extraction of structure from raw text.
MapLibre GL
Maps
Clustered, filterable venue map over CARTO tiles.
Apify
Scraping · assets
Facebook actors and image/asset enrichment.
TypeScript
Language
End-to-end types from intake scripts to UI.
tsx scripts
Intake jobs
The pipeline runs as standalone, schedulable scripts.
Why it holds up

Autonomous, but never unaccountable.

JSON-only
The scraper can’t write to the database; it physically only emits files.
Staged
Uncertain records wait in a review queue, never auto-published.
Sourced
A public “Our sources” page documents where data comes from.
Static
Prerendered output means no live DB in the request path to fail.
A functional demonstrator

See it running.

Angkor Now is live: a real, self-maintaining city guide built and sponsored by AI Boffin Hub to show what an LLM-run editorial product looks like in production.