Plan: Laureo AI — personalized prompts (precomputed) + interactive capability tour [REVISED]

resolved-cell: full·dynamic regime: high

Created: 2026-06-16 (rev 2)

Branch: main

Worktree: (none — authored on main; /implement should create one via /worktree)

<!-- ← Back to brief (current-plan-brief.md) — the plain-language owner brief; this full plan is for the executing agents -->

PLANNING PHASE ONLY — NOT IMPLEMENTED. Nothing built. Owner greenlight required before /implement.
REVISION 2 — supersedes rev 1 after owner review. Driven by 4 verified research threads (3 code-grounded, 1 external cited): (a) the "engine already exists" claim was misleading — corrected below; (b) re-architected around background precomputation (owner's ask: prompts in the DB *before* login, page just reads); (c) added an org/data-personalized library section; (d) added a major new pillar — the interactive first-run capability tour. Every design call is backed by code evidence or a cited source (see Evidence base).

The problem (owner's words)

Owner: *"The current prompts are extremely generic and not personalized… the few prompts we display need to immediately show the powerful capabilities of the Laureo AI assistant so that users see it's not just a chatbot like ChatGPT that is disconnected from your work tools."* Plus, on review: *"you mentioned the prompt engine already exists but I personally do not see it… the cached/instant prompts should always be cached in the background, not computed in real time… the data is already in our database prior to the user even logging in."* Plus two new asks: a personalized library section (org/own-data), and an interactive onboarding capability tour — a dismissible first-run experience ("Step 1 of 5") that demonstrates the breadth of the assistant on manufactured data with clickable demos, because *"users don't know its breadth of capabilities… if they're not nudged, they're just not going to get there."*

CORRECTED current-state truth (verified, file:line)

My earlier "the engine renders on detail pages" was misleading. The verified truth:


Locked decisions

Owner-locked (rev 1): Engine = Hybrid (rules + LLM) · Library "More" = in-chat slide-over drawer · Pinning ("make it mine") = in v1.

Owner-locked (rev 2): Build sequencing = all four pillars together in a single v1 ship · Tour launch = opt-in dismissible card · Tour demo data = scripted fake data only (no DB writes, no live LLM call during the tour).

Research-driven design choices folded in this revision:


Evidence base (the proof — full URLs in the Citations section)

  1. Suggested prompts get ignored when done wrong. NN/g's Amazon "Rufus" study: *"None of our participants proactively clicked a prompt suggestion."* Causes: competing mental model (a search bar), generic phrasing, surprise destination, weird icon/label. → Counter each: keep prompts inside the dedicated assistant, personalize with real specifics, do exactly what the label says, plain language. [NN/g Rufus]
  2. Specific + personalized wins. NN/g: *"Broad or generic prompt suggestions are rarely effective… Specific and targeted suggestions… are more likely to lead to meaningful interaction."* Placement *"near the text input field."* Context-aware to role/history. [NN/g use-case prompts]
  3. Follow-up suggestions are the highest-value type (post-answer, tailored). Empty-state "use-case" prompts are lower-yield → must be made specific to earn the click. [NN/g prompt-suggestion taxonomy]
  4. The "gulf of envisioning." Users can't envision what an LLM can do (capability/instruction/intentionality gaps). Remedy: suggest concrete prompt ideas, domain-specific entry points, show explainable output. [CHI 2024] Reinforced by NN/g's "articulation barrier" (*<20%* of people are fluent enough for bare prompt boxes → use chips/GUI, not an empty field).
  5. People underuse AI. ~65% of US workers use AI little/none (Pew); ~23% weekly work use (NBER). The breadth users never discover unaided is real → nudging is justified.
  6. Closest analogs validate the thesis. HubSpot Breeze grounds every answer in real CRM data (the differentiator); Intercom Fin demos on the tenant's own/derived data before commit; Copilot uses a curated prompt gallery + ≤10-prompt welcome; Gemini uses a 4-card empty state. [vendor docs]
  7. Tours: short, skippable, interactive, never dead-end. 3–5 steps (completion falls off past 5); visible Skip on every step; action-driven ("show, don't tell") beats passive tooltips; forced modals and highlight-everything are anti-patterns; AI-specific hazard = a step that dead-ends on empty data → seed/label sample data. [Thinkific, Appcues, Userpilot — magnitudes directional]

Pillar 1 — Precomputed, personalized starter prompts (the engine)

1a. Background precompute architecture (owner's core ask)

Build it the ai_user_writing_styles way (verified precedent), not the lazy-Redis way:

  user_id UUID PRIMARY KEY → profiles(user_id) ON DELETE CASCADE
  organization_id BIGINT NOT NULL → organizations ON DELETE CASCADE
  prompts JSONB NOT NULL DEFAULT '[]'      -- [{id,label,prompt,category}], the visible set + a few extra
  schema_version INT NOT NULL DEFAULT 1
  source TEXT NOT NULL DEFAULT 'rules'     -- 'rules' | 'llm'
  last_computed_at TIMESTAMPTZ
  next_refresh_at TIMESTAMPTZ DEFAULT NOW()
  enabled BOOLEAN DEFAULT true
  created_at/updated_at TIMESTAMPTZ DEFAULT NOW()
  -- partial index (next_refresh_at) WHERE enabled; index (organization_id); own-rows RLS USING (user_id = auth.uid())

Why a table not Redis: the owner wants the set present before login; aiToday's Redis day-key is lazily populated on first request (aiToday.ts:418-434) and lib/CLAUDE.md bans Redis for user data in SSR. A DB column is read in the existing page prefetch path with zero added LLM.

1b. Selection algorithm (lib/ai/promptLibrary/select.ts, pure)

selectStarters(ctx) -> Prompt[]   // ctx = {role, seatType, features, counts, timeOfDay, activationScore, prefs}
  candidates = CATALOG
    .filter(roles.includes(role ?? 'sales_rep'))
    .filter(!requiresFeature || features.includes(requiresFeature))
    .filter(!nonEmptyKey || counts[nonEmptyKey] > 0)          // NEVER dead-end (evidence #7)
    .filter(!prefs.hidden.includes(id))
    .concat(prefs.custom.map(toCandidate))
  score = roleAffinity + timeBonus + activationFit
  visible = resolvePinned(prefs.pinned).slice(0,6)             // user pins first
  fill remaining with CAPABILITY-DIVERSITY constraint until >=4 categories, then by score
  ensureActionSlot(visible)                                    // >=1 of DO/SEND/SCHEDULE/GET_PAID/AUTOMATE
  if activationScore<=1 OR counts all zero: use the "prove-it"/setup prompts (empty-data fallback)
  return visible.slice(0,6)

Default 6 visible (evidence: ~3–5 is the consensus; 6 matches the current grid and the owner's "top 6" — treat as the A/B-testable default, not a constant). Time-of-day from timezone (morning→plan-my-day; eod→log-today; friday→forecast).

1c. The capability-diversity rule (kills "it's just ChatGPT")

The visible set is forced to span ≥4 of 7 capability categories and always include ≥1 action category, so the user *sees* it can act, not just chat:

| Enum | Superpower | Example (outcome- + record-specific) | Tools |

|---|---|---|---|

| KNOW | Read/analyze | "Triage my pipeline into Now / This week / Watch" | scan_deals, get_pipeline_summary |

| DO | Create | "Add Globex as a lead and open a deal" | create_company, create_opportunity |

| SEND | Email | "Draft a follow-up to Acme on the $40k renewal — quiet 9 days" | draft_email |

| SCHEDULE | Calendar | "Book a 30-min demo with Jane next Tue 2pm + send the invite" | create_calendar_event |

| GET_PAID | Quotes/AR | "Chase the $8k Acme invoice (12 days overdue)" | list_overdue_invoices, draft_dunning_email |

| AUTOMATE | Agents | "Watch my stalled deals every morning while I sleep" | agent templates |

| REPORT | Manager | "How's the team tracking vs target this quarter?" | get_dashboard_metrics, summarize_win_loss |

The LLM layer's job is precisely to inject the real record names into llmUpgradable slots (the NN/g specificity lever).

1d. The catalog (lib/ai/promptLibrary/catalog.ts, typed, code-defined)

~40–55 entries, each {id,label,prompt,category,jobCategory,roles[],requiresFeature?,nonEmptyKey?,timeOfDay?,minActivation?,llmUpgradable?,proveIt?}. Labels outcome-phrased, plain language (evidence #1). Reuse the 6 current chips as sales_rep seeds.


Pillar 2 — The prompt library drawer (in-chat "More")


Pillar 3 — Pinning / "make it mine" (v1)

Per the Per-User Feature State on JSONB convention. Add profiles.ai_prompt_prefs JSONB NOT NULL DEFAULT '{}':

{ "schema_version": 1,
  "pinned": ["<id>"], "hidden": ["<id>"],
  "custom": [{ "id":"<uuid>", "label":"…", "prompt":"…", "created_at":"…" }],
  "capability_tour": { "dismissed_at": null, "last_step": 0, "completed_at": null } }

(Putting tour state here, not in onboarding_state, keeps all AI-assistant per-user state in one column and avoids editing the onboarding_state hydrator.) Helpers lib/ai/promptPrefs.ts: hydratePromptPrefs, buildInitialPromptPrefs, resolveVisible. API app/api/ai/prompt-prefs/route.ts PATCH (withAuth): pin/reorder/hide, add/edit/delete custom, dismiss tour. Validation: pinned≤12, custom≤20, label≤60, prompt≤600, strip control chars, org+own-user scoped. On change → set ai_starter_prompts.next_refresh_at=now() so the next read reflects pins.


Pillar 4 — Interactive first-run capability tour (NET-NEW)

Goal: nudge users to discover the breadth of the assistant (evidence #4/#5), via a short, skippable, *interactive* demo on *manufactured* data — without polluting the org or costing LLM calls.

4a. Architecture — self-contained, deterministic, free

The chat hook has no scripted-turn API (useAIChat only has sendMessage, which always fires a live billed LLM call — verified). BUT the card components are pure presentational and drivable with hand-built objects + callbacks at zero LLM cost: ArtifactCard + EmailDraftEditor (the email-draft demo), ApprovalCard, ActionResultCard ("Open Draft" chip), MarkdownRenderer, FollowupChips (clickable next-steps). So:

4b. Step design (≤5, one per capability category — evidence #7)

A 5-step arc that shows breadth, each a *clickable* demo on sample data ending in a "try the next one" chip:

  1. Know — "Here's your pipeline at a glance" (scripted summary card).
  2. Send — "I can draft an email" → click → a real ArtifactCard/EmailDraftEditor renders a draft to a sample contact; user can open/review it.
  3. Do — "I can create records" → click → an ApprovalCard shows a scripted create_task/create_opportunity preview.
  4. Get paid / Schedule (role-branched) — "I can chase an invoice" or "I can book a meeting" → scripted card.
  5. Automate — "I can do this on a schedule, for you" (agent template teaser) + a finale that hands off to the real /ai composer with a starter prompt prefilled (the user chooses to send it — the tour itself fires no live AI call, per the scripted-only decision).

Each step has a visible Skip and a "Step k of 5". Role-branched content where cheap (rep vs manager vs service).

4c. Launch & placement (evidence #7 — never force)


Files

New: lib/ai/promptLibrary/{types,catalog,select}.ts · lib/ai/promptPrefs.ts · lib/ai/starterPrompts.ts (substrate + LLM refine) · app/api/cron/refresh-starter-prompts/route.ts · app/api/ai/prompt-prefs/route.ts · app/components/ai/PromptLibraryDrawer.tsx · app/components/ai/CapabilityTour/ (component + scripted step registry + sample-fixture demo data) · tests (select diversity/action-slot/gating/pinned/empty-fallback; promptPrefs hydrate; catalog well-formed; tour step registry; cron smoke).

Modified: app/components/ai/AIPanel.tsx (empty state reads precomputed set + pin icon + "Browse all" + first-run tour card) · app/ai/page.tsx + AIPageClient.tsx (prefetch ai_starter_prompts; thread role; mount tour card) · app/components/ai/DashboardAIPanel.tsx (inherit) · app/api/auth/callback/route.ts (session-start top-up in the existing after()) · scripts/setup-qstash-schedules.ts (register cron) · provisioning path (seed rules row) · lib/onboarding/ wizard (tour step) · lib/ai/chatFlags.ts + lib/ai/CLAUDE.md (doc-drift fix + document the engine).

Migrations (both additive/idempotent, auto-apply): <ts>_ai_starter_prompts.sql (the table) · <ts>_profiles_ai_prompt_prefs.sql (the prefs column).


Acceptance criteria

  1. /ai empty state shows a role-appropriate, precomputed set read from the DB with zero LLM at load (verified: page reads ai_starter_prompts, no synchronous model call).
  2. Visible set spans ≥4 capability categories incl. ≥1 action (unit-proven); llmUpgradable slots name real records when data exists.
  3. No visible prompt dead-ends; empty/new tenant falls back to capability/setup prompts, never a blank or empty-result prompt.
  4. Prompts are precomputed in the background (cron + provisioning seed + session-start top-up) and refreshed; LLM cost is bounded to active, paying-org users and routed through the billing chain; restricted orgs spend $0.
  5. User can pin/reorder/hide/author prompts (persisted, cross-device); pins surface first; changes trigger a refresh.
  6. The drawer opens over chat with job-category groups, a "For you" real-data section, and a "Yours" section.
  7. The capability tour: opt-in dismissible card; ≤5 interactive steps on sample-fixture data (no DB writes, no LLM); visible Skip each step; dismiss/complete persists; available in onboarding AND on /ai first-run; never dead-ends.
  8. tsc 0 · next build PPR-clean · vitest green incl. new tests · every query org-scoped + .limit(), no select('*'), count:'estimated' · cron uses verifyCronRequest + restricted-skip.
  9. Doc-drift fixed; lib/ai/CLAUDE.md documents the precompute engine + tour.

Edge cases

Unknown role → sales_rep pool. New/empty org → capability/setup prompts + the tour still fully demos on sample strings. Collaborator/viewer seat → no LLM refine (rules only), tour still available. Redis/LLM/billing failure → rules set (or EMPTY_STATE_CHIPS). Pinned id removed from catalog → skip silently. Custom prompt over caps/control chars → 400. messages.length>0 → starters/tour card hidden. Cron partial failure → per-user next_refresh_at pushed, retried next run.

Phasing — single v1 ship (owner-locked: build all pillars together)

v1 = all four pillars in one ship. Pillar 1 (precompute engine: table + cron + rules/LLM + provisioning seed + session-start top-up; selection + diversity + non-empty + record-naming) · Pillar 2 (drawer incl. "For you" + "Yours") · Pillar 3 (pinning/custom) · Pillar 4 (interactive capability tour: tour component, scripted step registry, opt-in dismissible card, onboarding + /ai placements) · home-widget inheritance · tests · doc-drift fix.

Suggested internal build order (dependency ordering for /implement, NOT separate ships): migrations → precompute engine + selection algorithm → page/panel wiring + drawer → pinning/prefs → capability tour. File-disjoint where possible; all gated and shipped together.

Cost & scale

Pre-launch, tiny scale. Cron envelope .limit(50) AI/run like writing-styles (~$0.05/run, micro tier); rules layer free for all. No new env flag required (openers already render unconditionally); optional default-ON kill-switch in chatFlags.ts reverting to EMPTY_STATE_CHIPS.

Risks

Citations (proof)