Grafana Dashboards
The monitoring stack comes with pre-provisioned dashboards covering every layer of the infrastructure.
1. FirstBreath Platform Dashboard (Custom)
File: grafana_dashboard.json
This is the main "Command Center" for the vision pipeline. Use this to track business logic.
- Key Panels: Use this to see if the Application is healthy.
- Total Camera FPS: Are streams active?
- Inference Latency: Is the AI slow?
- Redis Queue Depth: Is the backlog growing?
- Active Workers: Are the
batch-inferencecontainers running?
2. Host Overview (Node Exporter)
File: node-exporter.json (Community ID: 1860)
Use this to check the health of the Physical/Virtual Machine (VPS).
- CPU Busy: Is the host processor maxed out?
- RAM Used: Is the system swapping?
- Disk I/O: Is the database writing too much to disk?
- Network Bandwidth: Is the RTSP stream saturating the 1Gbps link?
3. Docker Containers (cAdvisor)
File: cadvisor.json (Community ID: 14282)
Use this to debug specific containers (e.g., "Why did redis-worker crash?").
- Per-Container Memory: Identify memory leaks in Python services.
- Per-Container CPU: See which service hogs the processor.
- Network Rx/Tx: Track bandwidth per service.
4. NVIDIA GPU Metrics (DCGM)
File: dcgm.json
Use this to monitor the Neural Network Hardware.
- GPU Utilization: % of time the GPU kernel is busy.
- Memory Allocated: VRAM usage (Critical for YOLO models).
- Temperature: Ensure cooling is adequate (Thermal throttling kills performance).
- Power Usage: Wattage tracking.
📚 Import Notes
These dashboards are located in monitoring/dashboards/ and are automatically imported by Grafana via the provisioning configuration.
Note: Do not edit dashboards in the UI manually if
allowUiUpdatesis set to false; changes will be lost on container recreation! Edit the JSON source instead.