🧾 Vision Pipeline V1 – PRD

Authoritative specification for the end-to-end image-to-collection pipeline (Shoot → Share → Review). Optimised for AI agents and new contributors to ramp up quickly.

1 · Why This Exists

Collectors want a one-tap way to add cards to their digital collection. The pipeline turns a raw phone photo into structured user_cards rows with minimal user effort and latency.

2 · User Flow (Happy Path)

  1. Upload – user clicks the Process Scan button in the web app, selects an image, and submits the form (POST /api/scans).
  2. Auto-crunch – the Python worker picks up the queued job, runs detection & matching.
  3. Ping – Supabase Realtime notifies the front-end; badge appears within ≈10 s.
  4. Review Modal – user confirms or edits predictions. “Accept All” handles 95 % of cases.
  5. Binder Drop-zone – optional bulk attach to a binder (if any exist).
  6. Toast / Dopamine – “✓ 12 cards added · 1 min saved”. Analytics event fires.

3 · Technical Stack & Responsibilities

  • Supabase Storage – raw images (bucket: scans), card crops, summary images.
  • job_queue table + dequeue_and_start_job() RPC – single-row locking for workers.
  • Python Worker (worker/worker.py) – detection, enrichment, persistence with CLIP-based identification.
  • Computer Vision – YOLOv8 (ultralytics) weights pokemon_cards_trained.pt.
  • OCR / LLM Assist – PaddleOCR or Tesseract (future); placeholder mock_enrich_card().
  • External Card Data – PokémonTCG.io API for ground-truth metadata & prices (link).
  • Supabase Realtime – listeners on scans status transitions.
  • Next.js Front-end – Review UI modal, optimistic CRUD (Factorio Pattern).

4 · Data Model Touchpoints

  • scan_uploads – raw request; statuses: pending | processing | failed | completed.
  • job_queue (schema-owned) – async command runner (enum: pending | processing | completed | failed).
  • scans – user-visible processing entity (progress 0-100 %, summary image).
  • card_detections – one per bounding box (bbox, crop_url, guess_card_id, confidence).
  • cards – canonical catalogue; seed via PokémonTCG.io; upsert on miss.
  • user_cards – ownership; upsert on (user_id, card_id).
  • pipeline_review_items – low-confidence items surfaced in Review UI.

5 · Pipeline Algorithm (Worker)

1. dequeue job → mark 'processing'
2. create row in scans (progress 10 %)
3. download original image
4. run YOLOv8 detection → top N boxes (confidence ≥ 0.25)
5. save crops + summary image to Storage (progress 50 %)
6. for each crop:
   a. enrich via PokémonTCG.io (TODO replace mock)
   b. upsert into cards → obtain card_id
   c. insert card_detections row
   d. upsert into user_cards
7. mark scans.status = 'ready' (progress 100 %)
8. update job + scan_uploads statuses → trigger Realtime event
9. write verbose JSON log to worker/output

6 · Known Gaps (as of V1)

  • RPC Mismatchdequeue_and_start_job must return scan_upload_id + JSON payload.storage_path.
  • Worker Null-Safety – guard when job.payload is null.
  • Model Accuracy – retraining needed; current F1 ≈ 0.78.
  • OCR / LLM – placeholder mock_enrich_card(); replace with real API.
  • Review UI – Accept All + Binder Picker not implemented.

7 · Incremental Roadmap

  1. Fix RPC + payload, patch worker → restore green path.
  2. Add PokémonTCG.io fetch + caching.
  3. Retrain YOLO with ~2× dataset; push pokemon_cards_trained_v2.pt.
  4. Wire Review Modal + Accept All.
  5. Binder drop-zone bulk attach.
  6. Add metrics & DLQ for failed jobs.
  7. Long-term: move heavy CPU to serverless GPU (Modal / RunPod).