π§ Worker Pipeline Architecture
Production-ready Pokemon card processing pipeline with CLIP similarity search and autonomous recovery.
π System Overview
The Project Arceus worker pipeline combines YOLO detection with CLIP similarity search for fast, accurate card identification with autonomous operation.
π― Performance Metrics
- β’ Processing: CLIP-only identification
- β’ Speed: Fast embedding-based matching
- β’ Cost: Free processing (no API calls)
- β’ Uptime: 99.9%+ with auto-recovery
π€ Autonomous Features
- β’ Auto-Recovery: 30s detection, 10s resolution
- β’ Smart Retries: Max 3 attempts with backoff
- β’ Process Monitoring: Auto-restart on crash
- β’ Health Checks: Continuous system monitoring
π Processing Pipeline
π Step-by-Step Process
- 1Image Upload: User uploads Pokemon card images (JPEG, PNG, HEIC)Frontend stores images in Supabase Storage and creates job queue entry
- 2Job Dequeue: Python worker polls for new jobs every 5 secondsWorker uses
dequeue_job()with visibility timeout for atomic processing - 3YOLO Detection: Custom YOLOv8 model detects individual cardsTrained specifically for Pokemon cards with high detection accuracy
- 4CLIP Similarity: Fast embedding-based card matchingOpenAI CLIP ViT-B-32-quickgelu model with Pokemon card embeddings
- 5Card Database Lookup: Match identified cards to databaseAutomatic card creation and inventory updates with confidence tracking
- 6Result Storage: Save detection results and update job statusStore bounding boxes, confidence scores, and processing metadata
π§ CLIP Card Identification System
Our production CLIP similarity search system provides fast, cost-effective card identification with high accuracy.
π CLIP Similarity Search
- β’ Model: ViT-B-32-quickgelu (optimized)
- β’ Speed: Fast embedding-based matching
- β’ Cost: Free processing (no API calls)
- β’ Database: 19k+ Pokemon card embeddings
- β’ Accuracy: High confidence matches for known cards
π‘ System Benefits
β’ Cost-Effective: No API costs, completely free processing
β’ Fast Performance: Embedding-based similarity search
β’ Scalable: No per-card costs, unlimited processing
β’ Reliable: Self-hosted model with local embeddings
π§ Auto-Recovery System
Autonomous monitoring and recovery system eliminates manual intervention for stuck jobs.
π¨ Stuck Job Detection
- β’ Monitor Interval: Every 30 seconds
- β’ Timeout Threshold: 10 minutes in processing
- β’ Detection Query:
get_stuck_jobs()function - β’ Recovery Trigger: Automatic retry or permanent failure
π Smart Retry Logic
- β’ Max Retries: 3 attempts per job
- β’ Retry Tracking: Database column with attempt counter
- β’ Exponential Backoff: Increasing delays between retries
- β’ Permanent Failure: After 3 failed attempts
π₯ Process Health Monitoring
- β’ Worker Heartbeat: Regular job processing confirmation
- β’ Process Restart: Auto-restart crashed workers
- β’ Health Metrics: Job queue statistics and trends
- β’ Alert System: Real-time problem notifications
πΎ Database Schema
π Key Tables
- β’ id, scan_id, status
- β’ created_at, updated_at
- β’ visibility_timeout
- β’ retry_count, error_message
- β’ id, user_id, storage_path
- β’ status, created_at
- β’ processing metadata
- β’ id, scan_id, card_id
- β’ confidence, bbox coordinates
- β’ identification_method, cost
- β’ id, level, message
- β’ scan_id, created_at
- β’ processing context
π Health Monitoring Functions
SELECT * FROM job_queue_health;
Returns job statistics: total, processing, failed, average wait time
SELECT * FROM get_stuck_jobs();
Identifies jobs stuck longer than timeout threshold
SELECT * FROM auto_recover_stuck_jobs();
Automatically retries or marks failed stuck jobs
π Production Deployment
ποΈ Complete System Startup
python start_production_system.py
Starts worker + auto-recovery + process monitoring + health checks
β‘ Individual Components
cd worker && python worker.py
Main processing worker (YOLO + CLIP identification)
cd worker && python auto_recovery_system.py
Autonomous stuck job recovery monitor
npm run dev
Next.js frontend development server
π Environment Configuration
Required: SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY
Optional: HUGGING_FACE_TOKEN (for model downloads)
Settings: Confidence thresholds, retry counts
π Performance & Monitoring
π° Cost Analysis
- β’ Processing Cost: $0.00 (completely free)
- β’ No API Dependencies: Self-hosted model
- β’ Scalability: Unlimited processing capacity
- β’ Infrastructure: Only hosting costs
π System Metrics
- β’ Training Data: 4-category feedback system
- β’ Confidence Tracking: CLIP similarity scores
- β’ Error Analysis: Misidentification patterns
- β’ Model Performance: Embedding quality monitoring