# DALSEEN Dine — Pilot Support Playbook

**Audience:** platform staff supporting a real restaurant pilot.
**Branch surfaced:** dine-phase-22 onwards.
**Contract:** every action listed here is **safe** — no SQL writes, no order/payment/accounting mutations, no hardware integration.

---

## TL;DR (incident response order)

When a pilot site reports a problem:

1. **Open** `/app/dine/ops` for the tenant + branch.
2. **Scan Branch Health** (top panel) — every pill green/neutral = system is fine; rose/amber = where the pain is.
3. **Drill into the relevant panel** based on the colour:
   - alerts strip → severity ladder + recommended action
   - devices → online/offline + rename/deactivate stale rows
   - sync failures → retry / context
   - printer jobs → mark recovered / status
   - pilot diagnostics → recent runtime errors + slow requests
   - stuck orders → orders the cashier may have walked away from
4. **Export diagnostics** (top-right "Export diagnostics" button) and attach the JSON to the support ticket.
5. **Escalate** if any rule below is hit.

---

## 1. How to check branch health

`GET /api/v1/dine/support/branch-health?branch_id={id|all}` — exposed in the FE as the **Branch Health** panel at the top of `/app/dine/ops`.

The panel shows one pill per signal. Threshold colours:

| Pill | Neutral | Amber | Rose |
|---|---|---|---|
| Open orders | always neutral | — | — |
| Delayed orders | 0 | 1–4 | ≥5 |
| Stuck tickets | 0 | 1–4 | ≥5 |
| Low stock | 0 | 1–9 | ≥10 |
| Devices offline | 0 | 1–2 | ≥3 |
| Sync failures | 0 | 1–4 | ≥5 |
| Printer fails | 0 | 1–4 | ≥5 |
| Queue backlog | 0 | 1–49 | ≥50 |
| Slow requests | 0 | 1–9 | ≥10 |
| Runtime errors | 0 | 1–4 | ≥5 |

If a pill is rose, the corresponding lower panel will show the underlying rows.

---

## 2. How to export diagnostics

Press the **Export diagnostics** button next to the Refresh button on `/app/dine/ops`. Downloads a JSON file named `dine-diagnostics-{branch_id|all}-{ISO timestamp}.json`.

Contents:
- Dashboard summary (every section)
- Devices summary (last 200)
- Sync failures (today, capped 100)
- Printer jobs (today, capped 100)
- Recent operational events (today, capped 200)
- Known limitations

**Attach this to every support ticket.** Reviewers can replay the snapshot without reproducing the incident.

The export is read-only and contains no secrets, no stack traces, no PII beyond what already lives on the dashboard. Verified by `DineSupportDiagnosticsExportTest::test_export_does_not_include_secrets_or_stack_traces`.

---

## 3. How to retry sync safely

Open the **Sync failures** panel. Failed rows show their operation, code, and timestamp.

**Retry button** on each unrecovered row: increments `retry_count` and stamps `last_retry_requested_at`. Visible feedback as `↻×N` next to the operation name.

What it does:
- Records that support requested a retry.
- Emits a `sync.retry_requested` runtime event for the timeline.
- The FE offline outbox treats this as a priority hint on its next sweep.

What it does **NOT** do:
- ❌ Fake `retried_ok=true` (only the FE outbox can flip that, and only after the operation actually succeeds).
- ❌ Mutate any order, payment, or accounting state.
- ❌ Directly invoke any operation handler.

If retry doesn't recover the row within ~10 minutes, escalate.

---

## 4. How to mark printer recovered

Open the **Printer jobs** panel. Failed rows show "failed" badge + error code (e.g. `paper_out`, `offline`, `timeout`).

**Mark recovered button** (emerald) on each failed row: flips `status=succeeded`, stamps `completed_at`, emits a `printer.recovered` event with `manual=true` in the context bag.

What it does **NOT** do:
- ❌ Trigger a reprint (the print agent is responsible for that).
- ❌ Touch any order — print failures don't block the cashier flow.
- ❌ Talk to printer hardware.

Use this when the printer is back up and the underlying issue is resolved (paper reload, network restored). It tells the dashboard "this is no longer a live problem".

If the failure rate stays elevated after marking recovered, the underlying printer / network is still the issue — escalate.

---

## 5. How to clean stale devices

Open the **Devices** panel. Each row shows online/offline + last-seen timestamp + three actions:

| Button | Effect | Reversibility |
|---|---|---|
| ✏ (Rename) | Edit `device_name` and/or `branch_id` | always reversible — just rename again |
| ⏸ (**Deactivate** — Phase 22.5) | Soft-deactivate. Sets `is_active=false`, drops the row off the dashboard. **Auto-reactivates on the next heartbeat from the same `device_uuid`.** | reversible: device beats again |
| × (Unregister) | **Hard-delete** the row. Phase 17 endpoint. The next heartbeat creates a fresh row with a fresh `first_seen_at`. | not reversible — historical row is gone |

**Rule of thumb:**
- Use **Deactivate** for "this screen looks stale, maybe came back". Safe default.
- Use **Unregister** only for permanently retired hardware (replaced tablet, decommissioned station).

Both actions are tenant-scoped and emit operational events.

---

## 6. How to inspect stuck orders

Open the **Stuck orders** panel. Read-only — no force-close, no state mutation.

Four buckets:

| Bucket | Definition | What it usually means |
|---|---|---|
| Open > 60m | Order is open/sent/ready and was created more than 60 minutes ago | The cashier walked away or the order was forgotten. Needs cashier action. |
| Delayed | Open between 25 and 60 minutes | Service backup, kitchen behind, or a slow ticket. Often resolves itself. |
| Paid no served | `status=paid` but `served_at` is null | Cashier paid out before pressing "served". Books are correct (cash recorded), kitchen state is irregular. Audit-only — do NOT manipulate. |
| No movement 30m | Open and `updated_at` older than 30 minutes | Stale state. Often a forgotten table. |

**Never** press anything that closes a stuck order from the support side. The cashier flow is the only place that can close orders — that path posts the journal correctly. Forcing closure from outside breaks accounting integrity.

If a stuck order needs intervention, **call the branch manager** and have them close it through the cashier UI.

---

## 7. How to verify cash sale accounting

Cash sales must be inspectable end-to-end. The full chain is locked in `DineCashSaleAccountingPostingTest` (53 assertions on a single live order).

To verify a specific paid order is accounting-clean:

1. Get the `dine_orders.id` from the orders panel or stuck-orders bucket.
2. Look it up in the diagnostics export (or via tenant DB tooling on the platform side):
   ```
   SELECT * FROM journal_entries
    WHERE source = 'dine_order' AND source_ref = '<order_id>';
   ```
3. Confirm:
   - exactly one row per order (idempotent — Phase 3)
   - balanced: `SUM(debit) = SUM(credit)` on `journal_lines`
   - every `journal_lines.branch_id` matches the order's branch
   - `tender.cash` debit row matches the cash payment amount
   - `sales.dine_in` (or correct channel account) credit row matches subtotal
   - `tax.vat_payable` credit row matches `tax_total`
   - `cogs.food` debit + `inventory.ingredients` credit reflect recipe consumption

If any leg is missing or imbalanced, **stop and escalate**. Do NOT run any "fix" — accounting integrity rules are critical and any patch must go through engineering.

---

## 8. How to verify inventory movements

The dashboard's **Inventory signals** section shows `recipe_consume_today`, `recipe_void_today`, `wastage_movements_today`, `adjustment_movements_today`, `low_stock_count`, `production_done_today`.

To confirm an order's recipe consumption is correct:

1. Find the order's `dine_order_lines` IDs.
2. For each line's `menu_item_id`, look at its `recipes` + `recipe_components`.
3. Expect a `stock_movements` row per recipe component with `ref_type='recipe_consume'`, qty matching `recipe_components.qty × line.qty`, and `branch_id` matching the order.

The Phase 16 test `DineInventoryAccuracyTest` locks this invariant at the dashboard-signal level.

For void:
- Voiding a non-paid line → emits a symmetric `recipe_void` movement (Phase 1).
- Voiding a paid order → emits a refund journal mirror but does **NOT** reverse inventory (food was consumed). This is correct behaviour.

If the on-hand drifts from expectations, escalate — do not run an adjustment without engineering review.

---

## 9. Common incident scenarios

### "POS suddenly slow"
1. Branch Health → Slow requests rose? → drill into Pilot Diagnostics → "Slowest requests" card shows route + duration.
2. Queue backlog rose? → workers are behind. Check `php artisan queue:restart` on the host.
3. Devices panel shows the POS itself online?
4. Export diagnostics, attach to ticket.

### "Receipt didn't print"
1. Printer jobs → find the failed row.
2. Once the printer is fixed at the branch, press **Mark recovered**.
3. The cashier can rerun "Print check" from the order screen for the actual reprint — the support action does **not** auto-reprint.

### "Order looks stuck in 'sent_to_kitchen' for an hour"
1. Stuck orders → "Open > 60m" bucket.
2. Verify with the branch — was it forgotten, or does it need cancelling?
3. The cashier closes it on the POS. Support never closes orders directly.

### "Tablet went offline"
1. Devices panel → row in red.
2. If the device is genuinely offline, leave it — the heartbeat will reactivate on its own.
3. If it's a permanent replacement, **Unregister** it.
4. If it's "looks stale, not sure", **Deactivate** it (auto-reactivates on next beat).

### "Sync failures piling up"
1. Sync failures panel → check the dominant `error_code`.
2. `network` → branch likely lost connectivity; offline outbox should drain when it comes back.
3. `auth` → token might have expired; user re-login resolves.
4. `validation` → escalate; client and server disagree on shape.
5. Press **Retry** on a few rows as a hint to the outbox; do not press it dozens of times in a loop.

### "Cash drawer count doesn't match"
- This is **always** a cashier-flow / shift-close issue, not a Dine support issue. Loop in the Pay/POS module owner.

---

## 10. Escalation rules

Escalate to engineering immediately if:

- Any journal line is out of balance (debits ≠ credits).
- An order has `status=paid` but no `journal_entries` row with `source='dine_order'` AND `source_ref=order.id`.
- Inventory `on_hand` goes negative.
- A `recipe_consume` row exists without a matching `dine_order_line`.
- A duplicate `journal_entries` row exists for the same `(source, source_ref)`.
- The Phase-10 alert engine fires `critical` severity for `stuck-tickets` (≥5).
- Sync failures stay unrecovered for >30 minutes after multiple retries.
- Devices cluster offline at the same minute (suggests upstream outage, not per-device issue).

When escalating: include the diagnostics export, the specific entity IDs, and the timeline of operator actions.

---

## 11. What NOT to do

The following actions are **never appropriate from the support panel**:

- ❌ Run raw SQL against `dine_orders`, `dine_order_lines`, `journal_entries`, `journal_lines`, `stock_movements`, or `inventory_levels`. **Always.**
- ❌ Force `retried_ok=true` to "make sync look better".
- ❌ Force `status=paid` on an order without going through the cashier `/pay` endpoint.
- ❌ Press the printer "Mark recovered" button to stop alerts when the printer is genuinely still down — that hides the problem from operators.
- ❌ Hard-delete devices to "fix" a count mismatch — use Deactivate first.
- ❌ Modify accounting data of any kind. Ever. Even with engineering approval, accounting corrections go through reversal entries posted via the cashier flow.

---

## Endpoint reference (read-only)

| Route | Purpose |
|---|---|
| `GET /dine/dashboard?branch_id={id\|all}` | Six-section dashboard payload + alerts (Phase 8/10) |
| `GET /dine/devices?branch_id={id\|all}` | Device list with online/offline (Phase 11) |
| `GET /dine/sync-failures?branch_id={id\|all}` | Today's sync failures (Phase 12) |
| `GET /dine/printer-jobs?branch_id={id\|all}` | Today's printer jobs (Phase 13) |
| `GET /dine/runtime-diagnostics?branch_id={id\|all}` | API failures, slow requests, queue snapshot, recent timeline (Phase 21) |
| `GET /dine/support/branch-health?branch_id={id\|all}` | Compact TLDR (Phase 22.1) |
| `GET /dine/support/diagnostics-export?branch_id={id\|all}` | JSON export (Phase 22.2) |
| `GET /dine/support/stuck-orders?branch_id={id\|all}` | Read-only stuck-orders buckets (Phase 22.6) |

## Endpoint reference (mutating, support-safe)

| Route | Effect |
|---|---|
| `POST /dine/sync-failures/{id}/retry` | Increment retry_count, emit retry_requested event. Does NOT flip retried_ok. (Phase 22.3) |
| `POST /dine/printer-jobs/{id}/mark-recovered` | Flip failed → succeeded on the PRINT row only. Does NOT reprint. (Phase 22.4) |
| `POST /dine/devices/{id}/deactivate` | Soft-deactivate. Auto-reactivates on next heartbeat. (Phase 22.5) |
| `PATCH /dine/devices/{id}` | Rename device. (Phase 17) |
| `DELETE /dine/devices/{id}` | Hard-delete device. **Permanent.** (Phase 17) |

Every endpoint above is gated by the `module:dine` middleware and tenant-scoped via `BelongsToTenant`. Cross-tenant calls return 404 with no info leak.

---

*End of playbook. Last updated for Phase 22.7. Open a doc PR if a scenario surfaces that this playbook doesn't cover.*