cloudflare-workers Architecture
Serverless scraping architecture using Cloudflare Workers, Durable Objects, KV, and Queues.
Overview
The cloudflare-workers repository implements a distributed scraping system that:
- Processes Skybox inventory data
- Scrapes event listings
- Manages competitor purchases
- Updates ticket prices in real-time
Architecture
Core Components
Scraping Worker
Entry Point: Main worker handling scheduled tasks and HTTP requests
Responsibilities:
- Scheduled processing via cron triggers
- HTTP request handling
- Queue processing
- Durable Object coordination
Durable Objects
Purpose: Persistent state management and retries
Features:
- Persistent processing state
- Retry logic
- Rate limiting coordination
- State synchronization
Services Layer
FirecrawlService
- Web scraping with structured actions
- Page content extraction
- API integration
InventoryProcessorService
- Core data processing logic
- Price updates
- Data transformation
MeilisearchService
- Inventory retrieval
- Search functionality
- Index management
RateLimiterService
- API call throttling
- KV-backed storage
- Exponential backoff
SkyboxService
- Real-time inventory updates
- API integration
- Price management
Features
Scheduled Processing
- Automated inventory processing via cron triggers
- Configurable intervals
- Error handling and retries
Intelligent Rate Limiting
- KV-backed throttling
- Exponential backoff
- Per-domain rate limits
Distributed Processing
- Queue-based architecture
- Scalable worker instances
- Load distribution
State Management
- Persistent state using Durable Objects
- Retry logic
- State synchronization
Data Flow
-
Scheduled Trigger
- Cron triggers worker
- Fetches inventory from MeiliSearch
-
Processing
- Creates Durable Object for each inventory item
- Scrapes event listings
- Updates prices
-
State Management
- Durable Objects maintain state
- KV stores rate limiting data
- D1 stores persistent data
-
Price Updates
- Updates sent to Skybox API
- MeiliSearch index updated
- Metrics recorded
Configuration
Wrangler Configuration
Defined in wrangler.json:
- Worker routes
- KV namespace bindings
- Durable Object bindings
- Queue bindings
- D1 database bindings
Environment Variables
Configured in Cloudflare dashboard:
- API keys (Firecrawl, MeiliSearch, Skybox)
- Service URLs
- Rate limiting parameters
Development
Local Development
# Start local dev server
bun run dev
# Type check
bun run check
Deployment
# Deploy to Cloudflare
bun run deploy
# Apply D1 migrations
bun run predeploy
Testing
# Test scraping
bun run test-scraping
# Trigger manually
bun run trigger-scraping
Monitoring
Observability
- Sentry integration for error tracking
- Baselime for performance monitoring
- OpenTelemetry for distributed tracing
Metrics
- Processing rates
- Error rates
- Rate limiting events
- API call counts
Key Technologies
- Cloudflare Workers: Serverless runtime
- Durable Objects: Persistent state
- KV: Key-value storage
- Queues: Task scheduling
- D1: SQLite database
- TypeScript: Type safety