Troubleshooting
Common issues encountered when deploying the monitoring stack and how to solve them.
π "NVIDIA CUDA: NO" in Logsβ
Symptoms:
camera-managerlogs showNVIDIA CUDA: NOduring startup.- Log error:
Could not find decoder 'h264_cuvid'. - High CPU usage (ffmpeg falling back to software decoding).
Cause:
The ultralytics package (YOLO) declares opencv-python (CPU version) as a dependency. Pip installs it automatically, overwriting the custom header-only OpenCV (CUDA version) provided by the base image.
Solution:
You must explicitly uninstall the standard package in the Dockerfile:
# correct sequence
RUN pip install -r requirements.cuda.txt && \
pip uninstall -y opencv-python opencv-python-headless
And Rebuild the image: docker-compose up -d --build --force-recreate.
π Grafana: "Datasource not found"β
Symptoms:
- Dashboards show red error boxes.
- Error message:
Datasource ${DS_PROMETHEUS} was not found.
Cause:
This happens when importing dashboards exported from other systems that use Templating Variables for datasources, but your Grafana instance expects a hardcoded UID (e.g., prometheus).
Solution:
- Use hardcoded UID: In
datasource.yml, ensureuid: prometheus. - Clean JSON: Replace
"${DS_PROMETHEUS}"with"prometheus"in all dashboard JSON files. - Delete Cache: Grafana stores dashboards in its SQLite DB. If you updated the JSON file but see no change, Grafana is serving the cached version.
- Fix: Set
allowUiUpdates: falseindashboard.ymlto force file-based loading, or delete thegrafana-datavolume.
- Fix: Set
π "Server Misbehaving" (DNS Error)β
Symptoms:
- Prometheus Targets page shows
downformonitoring-cadvisorormonitoring-node-exporter. - Error:
server returned HTTP status 503ordial tcp: lookup monitoring-cadvisor on ...: no such host.
Cause:
The containers are likely not on the same Docker network (monitor-net).
Solution:
Check docker network inspect monitor-net. All 5 monitoring containers AND the camera-manager must be listed in the Containers section. If not, verify docker-compose.yml network definitions.