Skip to main content

cloudflare-workers

Serverless scraping architecture using Cloudflare Workers, Durable Objects, KV, and Queues.

Overview

The cloudflare-workers repository implements a distributed scraping system that:

  • Processes Skybox inventory data
  • Scrapes event listings
  • Manages competitor purchases
  • Updates ticket prices in real-time

Technology Stack

  • Runtime: Cloudflare Workers
  • Language: TypeScript 5.x
  • Storage: D1 (SQLite), KV (Key-Value), Durable Objects
  • Queue: Cloudflare Queues
  • Scraping: Firecrawl API, Cheerio
  • Search: MeiliSearch
  • Monitoring: Sentry, Baselime, OpenTelemetry

Getting Started

Prerequisites

  • Node.js 18+
  • Bun (preferred) or npm
  • Cloudflare account
  • Wrangler CLI

Setup

  1. Install dependencies:

    bun install
  2. Install Wrangler CLI:

    bun add -D wrangler
  3. Configure Cloudflare:

    • Create KV namespaces
    • Create Durable Object classes
    • Configure queues
    • Set up D1 database
  4. Start local development:

    bun run dev

Key Commands

Development

  • bun run dev - Start local development server
  • bun run check - Type check and validate
  • bun run types - Generate TypeScript types

Deployment

  • bun run deploy - Deploy to Cloudflare
  • bun run predeploy - Apply D1 migrations before deploy

Database

  • bun run seedLocalD1 - Apply migrations to local D1

Testing

  • bun run test-scraping - Test scraping functionality
  • bun run trigger-scraping - Trigger scraping locally
  • bun run trigger-processor - Trigger processor locally

Architecture

See Architecture Documentation for detailed architecture.

Core Components

  • Scraping Worker - Main entry point
  • Durable Objects - Persistent state management
  • Services - FirecrawlService, InventoryProcessorService, MeilisearchService, RateLimiterService, SkyboxService

Features

Scheduled Processing

  • Automated inventory processing via cron triggers
  • Configurable intervals
  • Error handling and retries

Intelligent Rate Limiting

  • KV-backed throttling
  • Exponential backoff
  • Per-domain rate limits

Distributed Processing

  • Queue-based architecture
  • Scalable worker instances
  • Load distribution

State Management

  • Persistent state using Durable Objects
  • Retry logic
  • State synchronization

Configuration

Wrangler Configuration

Defined in wrangler.json:

  • Worker routes
  • KV namespace bindings
  • Durable Object bindings
  • Queue bindings
  • D1 database bindings

Environment Variables

Configured in Cloudflare dashboard:

  • API keys (Firecrawl, MeiliSearch, Skybox)
  • Service URLs
  • Rate limiting parameters

Documentation