Troubleshooting

Common issues encountered when deploying the monitoring stack and how to solve them.

🛑 "NVIDIA CUDA: NO" in Logs

Symptoms:

camera-manager logs show NVIDIA CUDA: NO during startup.
Log error: Could not find decoder 'h264_cuvid'.
High CPU usage (ffmpeg falling back to software decoding).

Cause: The ultralytics package (YOLO) declares opencv-python (CPU version) as a dependency. Pip installs it automatically, overwriting the custom header-only OpenCV (CUDA version) provided by the base image.

Solution: You must explicitly uninstall the standard package in the Dockerfile:

# correct sequence
RUN pip install -r requirements.cuda.txt && \
    pip uninstall -y opencv-python opencv-python-headless

And Rebuild the image: docker-compose up -d --build --force-recreate.

📉 Grafana: "Datasource not found"

Symptoms:

Dashboards show red error boxes.
Error message: Datasource ${DS_PROMETHEUS} was not found.

Cause: This happens when importing dashboards exported from other systems that use Templating Variables for datasources, but your Grafana instance expects a hardcoded UID (e.g., prometheus).

Solution:

Use hardcoded UID: In datasource.yml, ensure uid: prometheus.
Clean JSON: Replace "${DS_PROMETHEUS}" with "prometheus" in all dashboard JSON files.
Delete Cache: Grafana stores dashboards in its SQLite DB. If you updated the JSON file but see no change, Grafana is serving the cached version.
- Fix: Set allowUiUpdates: false in dashboard.yml to force file-based loading, or delete the grafana-data volume.

🌐 "Server Misbehaving" (DNS Error)

Symptoms:

Prometheus Targets page shows down for monitoring-cadvisor or monitoring-node-exporter.
Error: server returned HTTP status 503 or dial tcp: lookup monitoring-cadvisor on ...: no such host.

Cause: The containers are likely not on the same Docker network (monitor-net).

Solution: Check docker network inspect monitor-net. All 5 monitoring containers AND the camera-manager must be listed in the Containers section. If not, verify docker-compose.yml network definitions.

🛑 "NVIDIA CUDA: NO" in Logs​

📉 Grafana: "Datasource not found"​

🌐 "Server Misbehaving" (DNS Error)​

🛑 "NVIDIA CUDA: NO" in Logs

📉 Grafana: "Datasource not found"

🌐 "Server Misbehaving" (DNS Error)