πŸ“š Database Design & Searchability

Authoritative reference for core tables, relationships, and what fields are searchable (text and vector).

Overview

Project Arceus stores canonical card data, user scans, YOLO detections, and CLIP embeddings in Supabase (Postgres). This page defines the entities, relationships, and which fields are used for search in the app and worker.

Core Entities

  • cards β€” Canonical catalogue. Key columns: id, name, set_code, card_number, rarity, image_url, pricing/meta.
  • card_embeddings β€” pgvector-backed CLIP embeddings. Columns: card_id β†’ embedding vector(512), image_url, model_version.
  • scans / scan_uploads β€” Upload events and progress/status for binder photos.
  • card_detections β€” YOLO detections and guessed IDs. Columns: scan_id, bbox, crop_url, guess_card_id, confidence.
  • user_cards β€” Inventory rows per user. Columns: user_id, card_id, quantity, condition.

Text Search (Cards)

The UI uses a fast text search RPC over cards for manual correction and browsing.

  • RPC: search_cards(search_term TEXT)
  • Searchable fields: name, set_code, card_number
  • Ordering: exact-prefix matches first, then partial, then by name β†’ set_code β†’ card_number
  • Limit: 20 results; query trimmed; min length 2
-- supabase/migrations/20250721000002_recreate_card_search_function.sql CREATE OR REPLACE FUNCTION search_cards(search_term TEXT) RETURNS TABLE (id UUID, name TEXT, set_code TEXT, card_number TEXT, image_url TEXT) AS $$ BEGIN IF search_term IS NULL OR length(trim(search_term)) < 2 THEN RETURN; END IF; RETURN QUERY SELECT c.id, c.name, c.set_code, c.card_number, c.image_url FROM cards c WHERE c.name ILIKE '%' || trim(search_term) || '%' OR c.set_code ILIKE '%' || trim(search_term) || '%' OR c.card_number ILIKE '%' || trim(search_term) || '%' ORDER BY CASE WHEN c.name ILIKE trim(search_term) || '%' THEN 1 WHEN c.name ILIKE '%' || trim(search_term) || '%' THEN 2 ELSE 3 END, c.name, c.set_code, c.card_number LIMIT 20; END; $$ LANGUAGE plpgsql;

Frontend hook: hooks/useCardSearch.ts β†’ /api/cards/search?q=… (supports rarity aliases like sir, sar, etc.).

Vector Search (CLIP)

The worker identifies cards via CLIP embeddings against card_embeddings using pgvector.

  • Table: card_embeddings(card_id TEXT PRIMARY KEY, embedding vector(512)) with ivfflat index.
  • RPC: search_similar_cards(query_embedding vector(512), similarity_threshold float, max_results int) (returns distance; worker converts to similarity).
  • Worker: worker/clip_lookup.py uses .rpc('search_similar_cards', ...) to fetch best matches.
-- supabase/migrations/20250115000000_create_card_embeddings_table.sql CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE card_embeddings ( card_id TEXT PRIMARY KEY, embedding vector(512) NOT NULL, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX card_embeddings_embedding_idx ON card_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Search Use-Cases by Feature

  • Scan Review – Correction Modal: UI text search to replace guess_card_id using name, set_code, card_number.
  • Worker Identification: CLIP vector search; stores best-match card_id on card_detections.guess_card_id with similarity as confidence.
  • Inventory: Read joins via user_cards.card_id β†’ cards; not directly searchable yet in UI.

Field Reference – What Must Be Searchable

  • Card text: cards.name (trigram index), cards.set_code, cards.card_number
  • Card visuals: card_embeddings.embedding (pgvector cosine)
  • Optional filters (future): rarity, types, hp, artist

API & Hooks Touchpoints

  • app/api/cards/search/route.ts – local JSON card search (aliases for rarities).
  • hooks/useCardSearch.ts – debounced UI search hook.
  • services/cards.ts – joins card_detections ↔ cards for review UI.
  • worker/clip_lookup.py – CLIP embedding search via search_similar_cards.

Open Questions / Next

  • Add rarity/type filters to text search RPC?
  • Expose a server route that proxies search_cards for SSR and RLS isolation?
  • Unify local JSON search vs DB RPC when catalogue is fully synced.