🧾 Vision Pipeline V1 – PRD
Authoritative specification for the end-to-end image-to-collection pipeline (Shoot → Share → Review). Optimised for AI agents and new contributors to ramp up quickly.
1 · Why This Exists
Collectors want a one-tap way to add cards to their digital collection. The pipeline turns a raw phone photo into structured user_cards rows with minimal user effort and latency.
2 · User Flow (Happy Path)
- Upload – user clicks the Process Scan button in the web app, selects an image, and submits the form (
POST /api/scans). - Auto-crunch – the Python worker picks up the queued job, runs detection & matching.
- Ping – Supabase Realtime notifies the front-end; badge appears within ≈10 s.
- Review Modal – user confirms or edits predictions. “Accept All” handles 95 % of cases.
- Binder Drop-zone – optional bulk attach to a binder (if any exist).
- Toast / Dopamine – “✓ 12 cards added · 1 min saved”. Analytics event fires.
3 · Technical Stack & Responsibilities
- Supabase Storage – raw images (bucket:
scans), card crops, summary images. - job_queue table +
dequeue_and_start_job()RPC – single-row locking for workers. - Python Worker (
worker/worker.py) – detection, enrichment, persistence with CLIP-based identification. - Computer Vision – YOLOv8 (ultralytics) weights
pokemon_cards_trained.pt. - OCR / LLM Assist – PaddleOCR or Tesseract (future); placeholder
mock_enrich_card(). - External Card Data – PokémonTCG.io API for ground-truth metadata & prices (link).
- Supabase Realtime – listeners on
scansstatus transitions. - Next.js Front-end – Review UI modal, optimistic CRUD (Factorio Pattern).
4 · Data Model Touchpoints
scan_uploads– raw request; statuses: pending | processing | failed | completed.job_queue(schema-owned) – async command runner (enum: pending | processing | completed | failed).scans– user-visible processing entity (progress 0-100 %, summary image).card_detections– one per bounding box (bbox, crop_url, guess_card_id, confidence).cards– canonical catalogue; seed via PokémonTCG.io; upsert on miss.user_cards– ownership; upsert on(user_id, card_id).pipeline_review_items– low-confidence items surfaced in Review UI.
5 · Pipeline Algorithm (Worker)
1. dequeue job → mark 'processing' 2. create row in scans (progress 10 %) 3. download original image 4. run YOLOv8 detection → top N boxes (confidence ≥ 0.25) 5. save crops + summary image to Storage (progress 50 %) 6. for each crop: a. enrich via PokémonTCG.io (TODO replace mock) b. upsert into cards → obtain card_id c. insert card_detections row d. upsert into user_cards 7. mark scans.status = 'ready' (progress 100 %) 8. update job + scan_uploads statuses → trigger Realtime event 9. write verbose JSON log to worker/output
6 · Known Gaps (as of V1)
- RPC Mismatch –
dequeue_and_start_jobmust returnscan_upload_id+ JSONpayload.storage_path. - Worker Null-Safety – guard when
job.payloadisnull. - Model Accuracy – retraining needed; current F1 ≈ 0.78.
- OCR / LLM – placeholder
mock_enrich_card(); replace with real API. - Review UI – Accept All + Binder Picker not implemented.
7 · Incremental Roadmap
- Fix RPC + payload, patch worker → restore green path.
- Add PokémonTCG.io fetch + caching.
- Retrain YOLO with ~2× dataset; push
pokemon_cards_trained_v2.pt. - Wire Review Modal + Accept All.
- Binder drop-zone bulk attach.
- Add metrics & DLQ for failed jobs.
- Long-term: move heavy CPU to serverless GPU (Modal / RunPod).