Skip to main content

cloudflare-workers Architecture

Serverless scraping architecture using Cloudflare Workers, Durable Objects, KV, and Queues.

Overview

The cloudflare-workers repository implements a distributed scraping system that:

  • Processes Skybox inventory data
  • Scrapes event listings
  • Manages competitor purchases
  • Updates ticket prices in real-time

Architecture

Core Components

Scraping Worker

Entry Point: Main worker handling scheduled tasks and HTTP requests

Responsibilities:

  • Scheduled processing via cron triggers
  • HTTP request handling
  • Queue processing
  • Durable Object coordination

Durable Objects

Purpose: Persistent state management and retries

Features:

  • Persistent processing state
  • Retry logic
  • Rate limiting coordination
  • State synchronization

Services Layer

FirecrawlService

  • Web scraping with structured actions
  • Page content extraction
  • API integration

InventoryProcessorService

  • Core data processing logic
  • Price updates
  • Data transformation

MeilisearchService

  • Inventory retrieval
  • Search functionality
  • Index management

RateLimiterService

  • API call throttling
  • KV-backed storage
  • Exponential backoff

SkyboxService

  • Real-time inventory updates
  • API integration
  • Price management

Features

Scheduled Processing

  • Automated inventory processing via cron triggers
  • Configurable intervals
  • Error handling and retries

Intelligent Rate Limiting

  • KV-backed throttling
  • Exponential backoff
  • Per-domain rate limits

Distributed Processing

  • Queue-based architecture
  • Scalable worker instances
  • Load distribution

State Management

  • Persistent state using Durable Objects
  • Retry logic
  • State synchronization

Data Flow

  1. Scheduled Trigger

    • Cron triggers worker
    • Fetches inventory from MeiliSearch
  2. Processing

    • Creates Durable Object for each inventory item
    • Scrapes event listings
    • Updates prices
  3. State Management

    • Durable Objects maintain state
    • KV stores rate limiting data
    • D1 stores persistent data
  4. Price Updates

    • Updates sent to Skybox API
    • MeiliSearch index updated
    • Metrics recorded

Configuration

Wrangler Configuration

Defined in wrangler.json:

  • Worker routes
  • KV namespace bindings
  • Durable Object bindings
  • Queue bindings
  • D1 database bindings

Environment Variables

Configured in Cloudflare dashboard:

  • API keys (Firecrawl, MeiliSearch, Skybox)
  • Service URLs
  • Rate limiting parameters

Development

Local Development

# Start local dev server
bun run dev

# Type check
bun run check

Deployment

# Deploy to Cloudflare
bun run deploy

# Apply D1 migrations
bun run predeploy

Testing

# Test scraping
bun run test-scraping

# Trigger manually
bun run trigger-scraping

Monitoring

Observability

  • Sentry integration for error tracking
  • Baselime for performance monitoring
  • OpenTelemetry for distributed tracing

Metrics

  • Processing rates
  • Error rates
  • Rate limiting events
  • API call counts

Key Technologies

  • Cloudflare Workers: Serverless runtime
  • Durable Objects: Persistent state
  • KV: Key-value storage
  • Queues: Task scheduling
  • D1: SQLite database
  • TypeScript: Type safety

References