VLM Preliminary Testing

Models Shortlisted

1. Qwen3.5-35B-A3B

2. Qwen3.5-122B-A10B (SELECTED)

3. Qwen3.5-397B-A17B

Performance: less than 2 seconds
Average inference time to describe image

SCAN_AREA Integration

WORKING

Capture frame to Base64 JPEG to modelapi VLM to Scene analysis to Keyword-based replan trigger

Technical Details

Model: klonyxh200-srv05-p8023 (Qwen Vision)
Hardware: H200 x1 GPU
Endpoint: https://modelapi.klass.dev/v1/chat/completions
Protocol: OpenAI-compatible HTTP POST
Auth: None (internal service)

Dynamic Replan Trigger

VLM analysis scans for 13 hazard keywords:

chemical, spill, gas, vapor, smoke, leak, fire, injury, trapped, unconscious, collapsed, deceased, help

Test Coverage

18/18 unit tests passing — Config, payload structure, base64 encoding, error handling