Skip to main content

sprout_monitoring Architecture

Comprehensive monitoring stack using Prometheus, Grafana, and Loki.

Overview

The sprout_monitoring repository provides:

  • Prometheus for metrics collection
  • Grafana for visualization and dashboards
  • Loki for centralized logging
  • Alertmanager for alerting

Architecture

Components

Prometheus

Purpose: Metrics collection and storage

Configuration:

  • Scrapes HTTP endpoints (/metrics)
  • Collects from Pushgateway
  • Stores time-series data
  • Alert rule evaluation

Targets:

  • sprout_backend API (port 3000)
  • sprout_etl API (port 3001)
  • Pushgateway (port 9091) for worker metrics

Grafana

Purpose: Visualization and dashboards

Features:

  • Pre-configured dashboards
  • Real-time metrics visualization
  • Log exploration
  • Alert management

Dashboards:

  • Sprout System Health
  • ETL System Monitoring
  • Inventory Monitoring
  • TickOps Quota Monitoring

Loki

Purpose: Centralized logging

Features:

  • Aggregates logs from services
  • Log querying and exploration
  • Integration with Grafana

Pushgateway

Purpose: Metrics from batch jobs and workers

Usage:

  • Background workers push metrics
  • Batch jobs report completion
  • Prometheus scrapes Pushgateway

Alertmanager

Purpose: Alert routing and notification

Features:

  • Alert deduplication
  • Grouping and routing
  • Notification channels (email, Slack, etc.)

Service Discovery

Services are configured in config/services.yaml:

services:
etl_fastify:
host: "host.docker.internal"
port: 3001
enabled: true

nestjs_monolith:
host: "host.docker.internal"
port: 3000
enabled: true

nestjs_worker:
host: "host.docker.internal"
port: 4001
enabled: false
pushgateway: true

Metrics Collection

HTTP Metrics

Services expose Prometheus metrics at /metrics:

  • Request duration
  • Success rates
  • Error counts
  • Business metrics

Pushgateway Metrics

Workers push metrics to Pushgateway:

  • Job processing duration
  • Job success/failure counts
  • Queue sizes
  • Active jobs

Dashboards

Sprout System Health

URL: /d/sprout-system-health

Panels:

  • Service status indicators
  • Request rates and success rates
  • Response time percentiles
  • Error rates by service
  • Job queue sizes
  • Worker metrics
  • Business metrics

ETL System Monitoring

URL: /d/etl-system-monitoring

Panels:

  • ETL pipeline health
  • Processing rates
  • Error tracking
  • Queue metrics

Inventory Monitoring

URL: /d/inventory-monitoring

Panels:

  • Inventory counts
  • Matching rates
  • Price changes
  • Purchase processing

Alerting

Alert Rules

Configured in Grafana (not Prometheus):

  • Service down alerts
  • High error rate alerts
  • Performance degradation
  • Worker health issues

Notification Channels

  • Email
  • Slack webhooks
  • PagerDuty
  • Custom webhooks

Configuration

Prometheus Config

Generated from config/services.yaml:

./scripts/generate-prometheus-config.sh

Grafana Provisioning

  • Datasources: grafana/provisioning/datasources/
  • Dashboards: grafana/provisioning/dashboards/
  • Alerting: grafana/provisioning/alerting/

Deployment

Local Development

docker-compose up -d

Production

./scripts/deploy-monitoring-droplet.sh

This script:

  • Creates DigitalOcean droplet
  • Sets up Docker
  • Configures Cloudflare Tunnel
  • Deploys monitoring stack

Access

Local

Production

  • Access via Cloudflare Tunnel
  • SSL certificates configured
  • Authentication required

Maintenance

Backup Configuration

# Backup configs
cp -r config/ backups/
cp -r grafana/provisioning/ backups/

Update Dashboards

  1. Edit dashboard JSON in grafana/provisioning/dashboards/
  2. Restart Grafana or reload provisioning

Review Alerts

  • Adjust thresholds based on historical data
  • Test alert channels regularly
  • Review alert noise

References