Skip to main content

Sprout Monitoring Stack - Metrics Specification

This document defines the expected metrics for all services in the Sprout platform. Use this as a reference when implementing monitoring in your services.

📋 Quick Reference by Service Type

ServiceEndpointPushgatewayKey Metrics
ETL Fastify/metricsNoETL processing, data transformation
NestJS Monolith/metricsNoHTTP requests, purchase processing, inventory matching
NestJS Worker/health onlyYesBackground job processing
ETL Worker/metricsOptionalETL processing, data validation

🔧 Implementation Libraries

Node.js/NestJS

npm install prom-client pino pino-loki

Note: The prom-client package includes Pushgateway functionality, so no additional packages are needed for workers.

📊 Required Metrics by Service

1. ETL Fastify Service (service="etl-fastify")

✅ Health Metrics

// Service availability (auto-generated by prom-client)
up{service="etl-fastify"} 1

🔄 ETL Processing Metrics

// ETL-specific metrics can be added here for data processing operations
// Example: data transformation, validation, and pipeline metrics

2. NestJS Monolith (service="nestjs-monolith")

✅ Health Metrics

up{service="nestjs-monolith"} 1

🌐 HTTP Request Metrics

// Histogram: HTTP request duration
http_request_duration_seconds{service="nestjs-monolith", method="GET|POST|PUT|DELETE", route="/api/purchases", status_code="200|400|500"}

// Counter: HTTP requests total
http_requests_total{service="nestjs-monolith", method="GET", route="/api/purchases", status_code="200"}

// Gauge: Active HTTP connections
http_active_connections{service="nestjs-monolith"}

🛒 Assigned Purchase Processing

// Counter: Total assigned purchase requests
assigned_purchase_requests_total{service="nestjs-monolith", status="success|failed", integration="standard|ticketmaster", website="stubhub|vivid_seats"}

// Histogram: Processing duration
assigned_purchase_processing_duration_seconds{service="nestjs-monolith", integration="standard", website="stubhub"}

// Histogram: Queue waiting time
assigned_purchase_job_queue_duration_seconds{service="nestjs-monolith", integration="standard"}

// Histogram: Cost difference analysis
assigned_purchase_cost_difference_dollars{service="nestjs-monolith", integration="standard", website="stubhub"}

📦 Inventory Management

// Counter: Inventory matching results
assigned_purchase_inventory_matches_total{service="nestjs-monolith", match_type="exact|partial|none", integration="standard"}

⚠️ Error Tracking

// Counter: Errors by type
assigned_purchase_errors_total{service="nestjs-monolith", error_type="validation|network|timeout|inventory", integration="standard"}

🔄 Downstream Actions

// Counter: Replacement purchases created
assigned_purchase_replacement_purchases_total{service="nestjs-monolith", integration="standard", website="stubhub"}

// Counter: Loss processing jobs created
assigned_purchase_loss_processing_total{service="nestjs-monolith", integration="standard", website="vivid_seats"}

3. NestJS Worker (service="nestjs-worker") - Pushgateway

🔄 Job Processing (Push to Pushgateway)

// Counter: Jobs processed
worker_jobs_processed_total{service="nestjs-worker", job_type="handle_assigned_purchase|cleanup|notification"}

// Counter: Job failures
worker_jobs_failed_total{service="nestjs-worker", job_type="handle_assigned_purchase", error_type="timeout|validation"}

// Gauge: Currently active jobs
worker_active_jobs{service="nestjs-worker", instance="worker-1"}

// Histogram: Job processing duration
worker_job_duration_seconds{service="nestjs-worker", job_type="handle_assigned_purchase"}

📋 Queue Management

// Gauge: Current queue size
job_queue_size{service="nestjs-worker", queue_name="purchase_processing", job_type="handle_assigned_purchase"}

4. ETL Worker (service="etl-worker")

✅ Health Metrics

up{service="etl-worker"} 1

🔄 ETL Processing

// Counter: Records processed
etl_records_processed_total{service="etl-worker", source="api|file|database", status="success|failed"}

// Histogram: Processing duration per batch
etl_batch_processing_duration_seconds{service="etl-worker", source="api", batch_size="100"}

// Counter: Data validation errors
etl_validation_errors_total{service="etl-worker", error_type="schema|format|business_rule"}

🏷️ Required Labels

All metrics MUST include these labels where applicable:

Core Labels (All Services)

  • service: Service identifier (etl-fastify, nestjs-monolith, etc.)
  • environment: Deployment environment (development, staging, production)
  • version: Service version for deployment tracking

Business Logic Labels

  • integration: Integration type (standard, ticketmaster)
  • website: Target website (stubhub, vivid_seats, seatgeek)
  • job_type: Background job type (handle_assigned_purchase, cleanup)
  • error_type: Specific error classification
  • status: Operation status (success, failed, pending)

HTTP Request Labels

  • method: HTTP method (GET, POST, PUT, DELETE)
  • route: API route template (/api/purchases/:id)
  • status_code: HTTP status code (200, 400, 500)

🔨 Implementation Examples

NestJS Monolith Service

import { Injectable } from '@nestjs/common';
import { Counter, Histogram, register } from 'prom-client';

@Injectable()
export class MetricsService {
private purchaseRequestsTotal: Counter<string>;
private processingDuration: Histogram<string>;

constructor() {
// Create metrics
this.purchaseRequestsTotal = new Counter({
name: 'assigned_purchase_requests_total',
help: 'Total assigned purchase requests processed',
labelNames: ['service', 'status', 'integration', 'website'],
registers: [register]
});

this.processingDuration = new Histogram({
name: 'assigned_purchase_processing_duration_seconds',
help: 'Time spent processing assigned purchases',
labelNames: ['service', 'integration', 'website'],
buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60],
registers: [register]
});
}

// Usage in your controller or service
async processPurchase(purchaseData: any): Promise<any> {
const timer = this.processingDuration.startTimer({
service: 'nestjs-monolith',
integration: purchaseData.integration,
website: purchaseData.website
});

try {
const result = await this.handlePurchaseProcessing(purchaseData);

this.purchaseRequestsTotal.inc({
service: 'nestjs-monolith',
status: 'success',
integration: purchaseData.integration,
website: purchaseData.website
});

return result;
} catch (error) {
this.purchaseRequestsTotal.inc({
service: 'nestjs-monolith',
status: 'failed',
integration: purchaseData.integration,
website: purchaseData.website
});

throw error;
} finally {
timer();
}
}
}

NestJS Worker (Pushgateway)

import { register, Counter, Histogram, Gauge, Pushgateway } from 'prom-client';
import { logger } from './logger.service';
import { config } from '../config';

export class MetricsService {
private readonly logger = logger.child({
service: 'metrics',
});
private readonly pushgateway?: Pushgateway<any>;

// Job Processing Metrics
public readonly jobsProcessed = new Counter({
name: 'worker_jobs_processed_total',
help: 'Total jobs processed by worker',
labelNames: ['service', 'job_type'],
});

public readonly jobProcessingDuration = new Histogram({
name: 'worker_job_duration_seconds',
help: 'Duration of job processing in seconds',
labelNames: ['service', 'job_type', 'status'],
buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60, 120],
});

public readonly jobFailures = new Counter({
name: 'worker_jobs_failed_total',
help: 'Total number of job failures',
labelNames: ['service', 'job_type', 'error_type'],
});

public readonly activeJobs = new Gauge({
name: 'worker_active_jobs',
help: 'Currently active jobs',
labelNames: ['service', 'instance'],
});

constructor() {
this.logger.info('Initializing Worker metrics service');

// Initialize Pushgateway if URL is configured
if (config.PUSHGATEWAY_URL) {
this.pushgateway = new Pushgateway(config.PUSHGATEWAY_URL);
this.logger.info(`Pushgateway configured: ${config.PUSHGATEWAY_URL}`);
} else {
this.logger.info(
'No Pushgateway URL configured - metrics will only be available via /metrics endpoint',
);
}

// Enable default metrics (CPU, memory, etc.)
register.setDefaultLabels({
app: 'sprout-worker',
version: process.env.npm_package_version || 'unknown',
service: 'nestjs-worker',
environment: config.NODE_ENV,
});
}

/**
* Push metrics to Pushgateway (for worker instances)
*/
async pushMetrics(jobName: string = 'nestjs-worker'): Promise<void> {
if (!this.pushgateway) {
this.logger.warn('Pushgateway not configured - skipping push');
return;
}

try {
await this.pushgateway.pushAdd({
jobName,
groupings: { instance: process.env.HOSTNAME || 'unknown' },
});
this.logger.debug(`Metrics pushed to Pushgateway for job: ${jobName}`);
} catch (error: any) {
this.logger.error('Error pushing metrics to Pushgateway:', error);
throw error;
}
}

/**
* Record job processing
*/
async handleJob(job: any): Promise<void> {
const timer = this.jobProcessingDuration.startTimer({
service: 'nestjs-worker',
job_type: job.type,
status: 'processing',
});

this.activeJobs.labels('nestjs-worker', process.env.HOSTNAME || 'unknown').inc();

try {
await this.processJob(job);

this.jobsProcessed.inc({
service: 'nestjs-worker',
job_type: job.type
});

timer({ status: 'success' });

// Push to gateway
await this.pushMetrics();
} catch (error: any) {
this.jobFailures.inc({
service: 'nestjs-worker',
job_type: job.type,
error_type: error.name || 'unknown'
});

timer({ status: 'error' });

// Still push metrics on error
await this.pushMetrics();

throw error;
} finally {
this.activeJobs.labels('nestjs-worker', process.env.HOSTNAME || 'unknown').dec();
}
}

private async processJob(job: any): Promise<void> {
// Your job processing logic here
}
}

🔗 Logging Integration

All services should also implement structured logging with Loki:

import pino from 'pino';

const logger = pino({
transport: {
target: 'pino-loki',
options: {
host: 'https://your-monitoring-domain.com:3100',
labels: {
service: 'etl-fastify',
environment: process.env.NODE_ENV
}
}
}
});

// Usage
logger.info({
purchaseId: 12345,
integration: 'standard',
website: 'stubhub',
duration: 1.2
}, 'Purchase processed successfully');

📝 Checklist for New Services

When implementing monitoring in a new service:

  • Health endpoint at /health returns 200 OK
  • Metrics endpoint at /metrics exposes Prometheus metrics
  • up metric automatically tracked
  • All business-specific metrics implemented with proper labels
  • Error metrics categorized by error_type
  • Duration metrics use histograms with appropriate buckets
  • Pushgateway integration for background workers
  • Structured logging with Loki transport
  • Service name consistent across metrics and logs

🎯 Metric Naming Conventions

  • Use snake_case for metric names
  • Include units in the name (_seconds, _total, _bytes)
  • Use descriptive prefixes (assigned_purchase_, worker_, etl_)
  • Counters end with _total
  • Use consistent label names across all metrics

This specification ensures all services provide consistent, comprehensive monitoring data for the Sprout platform.