Skip to main content

πŸ‘οΈ Architecture: Distributed Batch Inference

The FirstBreath Vision System is designed to solve a critical scalability problem: Python threads cannot efficiently handle concurrent heavy AI inference loops due to the GIL (Global Interpreter Lock).

To bypass this, we implemented a Producer-Consumer pipeline using Redis as a high-speed broker.

The 3-Stage Pipeline​

1. πŸ“· Camera Manager (The Producer)​

  • Service: services/camera-manager
  • Role: I/O Bound
  • Logic:
    • Connects to RTSP streams via OpenCV (Hardware Decoded).
    • Resizes frames to model input size (640x640) immediately.
    • Serializes frames (Binary/JPEG) and pushes them to the batch_frames Redis List.
  • Why?: Keeps the heavy I/O operations away from the GPU process. Can scale horizontally (multiple managers for hundreds of cameras).

2. 🧠 Batch Inference (The Worker)​

  • Service: services/batch-inference
  • Role: Compute Bound (GPU)
  • Logic:
    • Pulls N frames from Redis at once (Dynamic Baatching).
    • Constructs a single Tensor [BatchSize, 3, 640, 640].
    • Runs inference once on the GPU.
    • Splits results back by Camera ID.
    • Pushes raw bounding boxes to batch_results.
  • Performance: Increases throughput by ~400% compared to sequential processing.

3. βš™οΈ Redis Worker (The Logic)​

  • Service: services/redis-worker
  • Role: CPU / Logic Bound
  • Logic:
    • Consumes detection results.
    • Post-processing: Non-Maximum Suppression (NMS), filtering low confidence.
    • Business Logic: "Is the horse down?", "Is it moving too fast?".
    • Smoothing: Applies sliding window filters to prevent false positives.
    • Persistence: Sends alerts to Backend.

Key Technologies​

πŸš€ TensorRT & YOLOv11​

We use the ONNX Runtime (GPU) with TensorRT provider. The YOLOv11 model is exported with dynamic batch size support to allow processing anywhere from 1 to 32 cameras in a single pass.

⚑ Redis & Serialization​

To achieve minimal latency (<50ms):

  • Frames are encoded as JPEG (Quality 85) before transmission to reduce bandwidth.
  • Redis is configured as an in-memory ephemeral store (no persistence for frame queues).