Reply playbook — operating the brand-voice reply system

Read this before writing replies. The system generates the words; this doc covers the four behaviors that decide whether the words land as brand-voice or evaporate as template.

Companion to:

Review-Reply-Prompt.md (vault — the prompt itself, with worked examples)

docs/reply-behavior-benchmark.md (where the data behind the playbook came from)

The opportunity, in one sentence

Across the 10 closest premium-Asian peers in Amsterdam, the median Google reply rate is ~0%. Only Yamazato and Sazanka (both Hotel Okura) reply consistently. Reply quality is a wide-open competitive moat — every move in this playbook is fully under Michiu's control, costs nothing on the menu, and shows publicly on every review page.

The four behaviors that separate Sazanka-tier from the rest

Pulled directly from the benchmark. None of these depend on the LLM — they depend on the operator running the system.

1. Sign with a real first name, every time

Sazanka's negative replies open "I would like to start by apologizing…" and close "— The Teppanyaki Sazanka Restaurant team." Yamazato signs with named staff (Chihomi, Yathin, Akos). Generic "Team Michiu" is dead on arrival — the prompt enforces this.

The system assigns a real Michiu front-of-house name per review, deterministically (same review always gets the same name on regenerate). The pool comes from REPLY_SIGNATURE_POOL in .env.local — set this to the actual rotating staff names before going live. The default placeholder pool (Sara, Mark, Anna, Tom) is a forcing function: if you see those names in production, you forgot to set the env var.

2. Reply within 48 hours — especially to 1-3★

Negative reviews more than two days old read as ignored, no matter what the reply eventually says. Sazanka's accountability lands because it lands fast.

The inbox surfaces the urgent backlog at the top of the page: "⚠ Urgent: N unanswered 1-3★ reviews older than 48h." Click it to filter to just those. This is the queue that earns the moat — clear it before doing anything else.

Per-row age badges:

<24h (green) — plenty of runway
24-72h (amber) — getting urgent
3-7d (orange) — well past target
>7d (red) — reputation problem; consider a fresh approach (don't just paste the auto-suggestion months later)

3. Match the reviewer's language

Dutch review → Dutch reply. English review → English reply. Mixed → English. The system detects automatically and tags the variant — but verify before approving. Wrong-language replies signal a bot.

4. Reply to negatives with specific accountability

This is where most restaurants flinch. Sichuan Food (ex-Michelin star) has public negative reviews with no replies at all — including specific complaints about pushy upselling. Hosokawa replies to praise but ducks the complaints. Don't be Hosokawa.

The Kahneman pattern (encoded in the prompt's system message):

Acknowledge the specific problem in their words. Don't paraphrase to soften.
Take responsibility honestly. "On us." "That should not have happened." "We've got it on the radar for the team."
Show the gap closing. What you'll do differently, or what you already do well that didn't land that night.
Never beg. No "we hope you'll give us another chance." Confidence reads quieter.
Never explain why they were wrong. Even if they were.

If the review names a specific dish, moment, staffer, or time — your reply must reference that exact thing. The auto-checklist surfaces "References a specific detail from the review" and turns red if not.

Using the admin UI

Inbox (`/admin/reviews`)

Top banner: ⚠ urgent backlog count + filter button
Per-row: source, author, rating, date, age badge, urgent tag (if applicable), themes, status, draft count
Filter bar: status / source / rating / date range / search by author or text

Detail (`/admin/reviews/[id]`)

Review block at top with age badge prominently displayed
Enrichment panel: sentiment, themes, mentioned dishes, persona, recurring, Circle tier
Target length guidance above the variants — derived from rating:
- 5★ → 15-45 words (short, confident)
- 4★ → 25-70 words (warm but brief)
- 3★ → 50-120 words (acknowledge specifically + take responsibility)
- 1-2★ → 60-150 words (Kahneman territory)
Three reply variants — standard / warmer / tighter — with per-variant word count vs target band, ban-phrase check, voice and language tags
Quality checklist below the variants — auto-checks (banned phrases, length, signature, specificity, language) and read-and-confirm checkboxes (tone, voice, no-begging, Circle privacy, accountability)
Approve action copies the final text + opens the source URL for paste-back (until Phase 2's Google API auto-post lands)

When to regenerate

Auto-checks flag multiple failures (length and specificity both off, e.g.)
Tone is wrong — Kahneman where Sutherland was needed (or vice versa)
Specific detail genuinely missing from all three variants
Ban-check hit appears (post-prompt-version-bump regression)

When to edit instead of regenerate

Variant is 90% there — just needs a small tweak
One variant has the structure right, another has the right tone — combine by hand
Edge cases the prompt doesn't cover (e.g. reviewer mentions a Michiu staff member by name — paste it in)

When to skip the AI entirely

Don't auto-reply to:

Spam / off-topic non-reviews
Reviews from a guest you have an offline relationship with (just write it)
Crisis situations (allergic reaction, food poisoning report, public-health concern) — escalate, don't templated-reply

Configuration

Set in .env.local for development; in Vercel/Cloudflare for production.

# Front-of-house staff names that rotate as reply signatures.
# REQUIRED before going live — the default pool is a placeholder.
REPLY_SIGNATURE_POOL=Sara,Mark,Anna,Tom,Lisa,Henry

# Anthropic — drives variant generation
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-6

If the Anthropic balance is at zero or the API errors, the system automatically falls back to deterministic mock variants and surfaces an amber banner on the detail page. The mock keeps the pipeline runnable but shouldn't be approved verbatim — it's templated, not brand-voice.

What success looks like

After 90 days running this playbook, the metrics that should move:

Metric	Today (Michiu)	Target
Google reply rate	0%	95%+
Reply rate to 1-3★ within 48h	0%	100%
Replies signed with a real first name	0%	100%
Median reply latency	n/a	<24h
% of replies edited before approval	n/a	30-50% (sweet spot — too low means trust slipping, too high means prompt drifting)

When all of these hit target consistently, reply quality stops being a moat because peers will copy it. By then we want the next moat (probably the events page + RAI-corporate funnel from docs/group-event-capacity.md).

Hard rules — the things that override the playbook

These come from the brand foundation and are not negotiable, ever:

Never auto-post. Human approval gates every reply until the eval harness in Phase 5 passes.
Never reference a Circle member's loyalty tier publicly. The Circle is felt, not announced. Internal flag only.
Never use the banned vocabulary. fine dining, premium, exceptional, refined, sophisticated, elevated, curated, distinctive, unreasonable hospitality, foodie, culinary journey, flavor explosion, taste bud, game-changer. The auto-checklist catches these but read your reply before clicking ✓.
Never claim "unreasonable hospitality" in copy. Live it; don't announce it.
Never use exclamation marks unless the reviewer led with them.
Never quote the review back word-for-word. Paraphrase the reference.
Never invent dishes, events, or staff names the review didn't mention.
Never assume gender from the reviewer's name.

The system enforces what it can; you enforce the rest.