Skip to main content

cloudflare-workers

Serverless scraping architecture using Cloudflare Workers, Durable Objects, KV, and Queues.

Overview

The cloudflare-workers repository implements a distributed scraping system that:

Processes Skybox inventory data
Scrapes event listings
Manages competitor purchases
Updates ticket prices in real-time

Technology Stack

Runtime: Cloudflare Workers
Language: TypeScript 5.x
Storage: D1 (SQLite), KV (Key-Value), Durable Objects
Queue: Cloudflare Queues
Scraping: Firecrawl API, Cheerio
Search: MeiliSearch
Monitoring: Sentry, Baselime, OpenTelemetry

Getting Started

Prerequisites

Node.js 18+
Bun (preferred) or npm
Cloudflare account
Wrangler CLI

Setup

Install dependencies:
```
bun install
```
Install Wrangler CLI:
```
bun add -D wrangler
```
Configure Cloudflare:
- Create KV namespaces
- Create Durable Object classes
- Configure queues
- Set up D1 database
Start local development:
```
bun run dev
```

Key Commands

Development

bun run dev - Start local development server
bun run check - Type check and validate
bun run types - Generate TypeScript types

Deployment

bun run deploy - Deploy to Cloudflare
bun run predeploy - Apply D1 migrations before deploy

Database

bun run seedLocalD1 - Apply migrations to local D1

Testing

bun run test-scraping - Test scraping functionality
bun run trigger-scraping - Trigger scraping locally
bun run trigger-processor - Trigger processor locally

Architecture

See Architecture Documentation for detailed architecture.

Core Components

Scraping Worker - Main entry point
Durable Objects - Persistent state management
Services - FirecrawlService, InventoryProcessorService, MeilisearchService, RateLimiterService, SkyboxService

Features

Scheduled Processing

Automated inventory processing via cron triggers
Configurable intervals
Error handling and retries

Intelligent Rate Limiting

KV-backed throttling
Exponential backoff
Per-domain rate limits

Distributed Processing

Queue-based architecture
Scalable worker instances
Load distribution

State Management

Persistent state using Durable Objects
Retry logic
State synchronization

Configuration

Wrangler Configuration

Defined in wrangler.json:

Worker routes
KV namespace bindings
Durable Object bindings
Queue bindings
D1 database bindings

Environment Variables

Configured in Cloudflare dashboard:

API keys (Firecrawl, MeiliSearch, Skybox)
Service URLs
Rate limiting parameters

Documentation

Architecture Documentation
README.md - Full documentation
Additional documentation:

sprout_backend - Backend API
sprout_etl - ETL service
skybox_token_extractor - Token extraction