Skip to main content

Grafana Dashboards

The monitoring stack comes with pre-provisioned dashboards covering every layer of the infrastructure.

1. FirstBreath Platform Dashboard (Custom)

File: grafana_dashboard.json

This is the main "Command Center" for the vision pipeline. Use this to track business logic.

  • Key Panels: Use this to see if the Application is healthy.
    • Total Camera FPS: Are streams active?
    • Inference Latency: Is the AI slow?
    • Redis Queue Depth: Is the backlog growing?
    • Active Workers: Are the batch-inference containers running?

2. Host Overview (Node Exporter)

File: node-exporter.json (Community ID: 1860)

Use this to check the health of the Physical/Virtual Machine (VPS).

  • CPU Busy: Is the host processor maxed out?
  • RAM Used: Is the system swapping?
  • Disk I/O: Is the database writing too much to disk?
  • Network Bandwidth: Is the RTSP stream saturating the 1Gbps link?

3. Docker Containers (cAdvisor)

File: cadvisor.json (Community ID: 14282)

Use this to debug specific containers (e.g., "Why did redis-worker crash?").

  • Per-Container Memory: Identify memory leaks in Python services.
  • Per-Container CPU: See which service hogs the processor.
  • Network Rx/Tx: Track bandwidth per service.

4. NVIDIA GPU Metrics (DCGM)

File: dcgm.json

Use this to monitor the Neural Network Hardware.

  • GPU Utilization: % of time the GPU kernel is busy.
  • Memory Allocated: VRAM usage (Critical for YOLO models).
  • Temperature: Ensure cooling is adequate (Thermal throttling kills performance).
  • Power Usage: Wattage tracking.

📚 Import Notes

These dashboards are located in monitoring/dashboards/ and are automatically imported by Grafana via the provisioning configuration.

Note: Do not edit dashboards in the UI manually if allowUiUpdates is set to false; changes will be lost on container recreation! Edit the JSON source instead.