Vision Analysis

The vision module provides AI-powered computer vision analysis of robot camera feeds using processing on the Appliance. Analysis is performed using multiple AI providers including OpenAI GPT-4 Vision, COCO-SSD object detection, motion detection, and geometric shape detection with color recognition.

configureVisionProvider(providerName, config)

Configure a vision provider on the Appliance (e.g., set API key).

Parameters:

  • providerName (string) - Provider name (e.g., 'openai-gpt4v', 'yolo', 'motion_detection')
  • config (object) - Provider configuration
    • apiKey (string) - API key for providers that require it
    • model (string) - Model name for AI providers
    • maxTokens (number) - Max tokens for AI providers
    • sensitivity (number) - Sensitivity for motion detection (1-100, default: 15)
    • confidenceThreshold (number) - Confidence threshold for object detection (0.0-1.0, default: 0.5)
    • maxImageWidth (number) - Maximum input image width for performance optimization (default: 480)
    • modelBase (string) - YOLO model base: 'lite_mobilenet_v2' (fast, default), 'mobilenet_v1' (balanced), 'mobilenet_v2' (accurate)
    • maxDetections (number) - Maximum detections per frame (1-100, default: 20)

Returns: Promise<boolean> - True if provider was configured successfully

YOLO Configuration Options:

OptionDefaultDescription
modelBase'lite_mobilenet_v2'Model architecture: 'lite_mobilenet_v2' (fastest), 'mobilenet_v1' (balanced), 'mobilenet_v2' (most accurate)
confidenceThreshold0.5Minimum confidence score (0.25-0.35 for more detections, 0.5+ for fewer false positives)
maxDetections20Maximum objects detected per frame (up to 100 for crowded scenes)
maxImageWidth480Input image width (higher = better small object detection but slower, 640-800 for accuracy)
JS
// Configure OpenAI provider with API key await oloClient.vision.configureVisionProvider('openai-gpt4v', { apiKey: 'your-openai-api-key', model: 'gpt-4o', maxTokens: 300 }); // Configure YOLO for maximum accuracy (slower) await oloClient.vision.configureVisionProvider('yolo', { modelBase: 'mobilenet_v2', // Most accurate model confidenceThreshold: 0.35, // Lower threshold = more detections maxDetections: 50, // More objects per frame maxImageWidth: 640 // Higher resolution for small objects }); // Configure YOLO for maximum speed (default settings) await oloClient.vision.configureVisionProvider('yolo', { modelBase: 'lite_mobilenet_v2', // Fastest model (default) confidenceThreshold: 0.5, // Balanced threshold (default) maxDetections: 20, // Standard limit (default) maxImageWidth: 480 // Fast processing (default) }); // Configure motion detection sensitivity await oloClient.vision.configureVisionProvider('motion_detection', { sensitivity: 15 }); // Configure jsfeat shape detection (contour analysis + fill ratio, runs locally) await oloClient.vision.configureVisionProvider('jsfeat_shapes', { minArea: 400, // Minimum shape area in pixels (filters noise) cannyLowThreshold: 25, // Canny edge detection low threshold cannyHighThreshold: 100, // Canny edge detection high threshold analysisWidth: 280 // Image width for processing }); // Detects 3D objects: spheres, cubes, cuboids with color (red, blue, green, etc.)

JSFeat Shape Detection Options:

OptionDefaultDescription
minArea400Minimum contour area in pixels (filters small noise)
maxAreaFraction0.8Maximum shape area as fraction of image (filters full-frame)
cannyLowThreshold25Canny edge detection low threshold
cannyHighThreshold100Canny edge detection high threshold
analysisWidth280Image width for processing (lower = faster)

Detected Shape Classes (with color prefix): Detection classes include color, e.g., "red sphere", "blue cube", "green cuboid"

  • sphere - Circular/spherical objects (high circularity, ~1:1 aspect ratio)
  • ellipse - Oval/oblong shapes (high circularity, non-square aspect ratio)
  • cube - Square-ish 3D objects (4-5 vertices, ~1:1 aspect ratio)
  • cuboid - Rectangular 3D objects (4-7 vertices, rectangular aspect ratio)
  • triangle - 3-vertex shapes
  • hexagon - 6-vertex shapes
  • polygon - Irregular shapes with 8+ vertices

Detected Colors:

  • red, orange, yellow, green, cyan, blue, purple, pink
  • white, gray, black

Detection Properties: Each detection includes additional properties:

JS
{ class: "red sphere", // Color + shape confidence: 0.85, bbox: { x, y, width, height }, // Bounding box coordinates properties: { shape: "sphere", // Shape only color: "red", // Color only vertices: 12, // Simplified polygon vertices circularity: 0.92, // 0.0-1.0, higher = more circular area: 1250, // Area in pixels aspectRatio: 1.02 // Width/height ratio } }

getAvailableVisionProviders()

Get available vision providers and their capabilities from the Appliance.

Returns: Promise<Array> - Array of provider info objects

JS
const providers = await oloClient.vision.getAvailableVisionProviders(); console.log('Available providers:', providers); // Returns: [{ name: 'yolo', displayName: 'COCO-SSD Object Detection', capabilities: [...], requiresApiKey: false, isConfigured: true, isCustom: false }]

registerCustomProvider(providerName, providerConfig)

Register a custom vision provider with user-defined analysis logic. The provider will persist in appliance memory until unregistered or appliance restart. Custom providers can be used with startVisionAnalysis() just like built-in providers.

Parameters:

  • providerName (string) - Unique name for the provider (cannot override built-in providers: 'yolo', 'motion_detection', 'openai-gpt4v', 'jsfeat_shapes')
  • providerConfig (object) - Provider configuration
    • analyzeFrame (function) - Analysis function: async (imageBuffer, config) => result
      • imageBuffer (Buffer/bytes) - JPEG image data
      • config (object) - Configuration passed from startVisionAnalysis
      • Returns: An object with analysis results
    • displayName (string, optional) - Human-readable name for the provider
    • description (string, optional) - Description of what the provider does
    • capabilities (string[], optional) - List of capabilities (e.g., ['object_detection', 'bounding_boxes'])

Returns: Promise<boolean> - True if provider was registered successfully

Notes:

  • JavaScript sandbox globals: Buffer, sharp (image processing), Math, JSON, Date, console, Promise, setTimeout, setImmediate
  • Python: Functions are executed in-process in SDK Playground. For external scripts, the function must be defined in a source file. Any Python packages installed on the appliance can be used (PIL, OpenCV, numpy, etc.).

Example - Brightness Detection:

JS
// Register a custom brightness detector await oloClient.vision.registerCustomProvider('brightness_detector', { displayName: 'Brightness Detector', description: 'Analyzes average image brightness', capabilities: ['custom_analysis'], analyzeFrame: async function(imageBuffer, config) { // Use sharp (available in sandbox) to process image const { data, info } = await sharp(imageBuffer) .resize(160, 120) .grayscale() .raw() .toBuffer({ resolveWithObject: true }); // Calculate average brightness let total = 0; for (let i = 0; i < data.length; i++) { total += data[i]; } const avgBrightness = total / data.length; return { brightness: avgBrightness, isLight: avgBrightness > 128, message: avgBrightness > 128 ? 'Scene is well lit' : 'Scene is dark' }; } }); // Use the custom provider like any built-in provider const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'brightness_detector', { intervalMs: 1000, onResult: (result) => { console.log(`Brightness: ${result.brightness.toFixed(1)}, ${result.message}`); } });

Example - Color Region Detection with Bounding Boxes:

JS
// Register a custom color detector that returns bounding boxes await oloClient.vision.registerCustomProvider('red_detector', { displayName: 'Red Region Detector', description: 'Detects red-colored regions in the image', capabilities: ['object_detection', 'bounding_boxes'], analyzeFrame: async function(imageBuffer, config) { const { data, info } = await sharp(imageBuffer) .resize(320, 240) .raw() .toBuffer({ resolveWithObject: true }); const detections = []; const width = info.width; const height = info.height; const channels = info.channels; // Grid-based red region detection const gridSize = 32; for (let gy = 0; gy < height; gy += gridSize) { for (let gx = 0; gx < width; gx += gridSize) { let redCount = 0; let totalPixels = 0; for (let y = gy; y < Math.min(gy + gridSize, height); y++) { for (let x = gx; x < Math.min(gx + gridSize, width); x++) { const idx = (y * width + x) * channels; const r = data[idx]; const g = data[idx + 1]; const b = data[idx + 2]; // Check if pixel is "red" if (r > 150 && r > g * 1.5 && r > b * 1.5) { redCount++; } totalPixels++; } } // If more than 30% of cell is red, add detection if (redCount / totalPixels > 0.3) { detections.push({ class: 'red_region', confidence: redCount / totalPixels, bbox: { x: gx, y: gy, width: gridSize, height: gridSize } }); } } } // Return object_detection type to enable bounding box rendering return { type: 'object_detection', detections: detections, inputFrame: { width, height } }; } }); // Use with bounding boxes const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'red_detector', { intervalMs: 500, showBoundingBoxes: true, videoElement: videoElement, onResult: (result) => { if (result.detections.length > 0) { console.log(`Detected ${result.detections.length} red region(s)`); } } });

unregisterCustomProvider(providerName)

Unregister a custom vision provider. Cannot unregister built-in providers or providers currently in use.

Parameters:

  • providerName (string) - Name of the provider to unregister

Returns: Promise<boolean> - True if unregistered successfully

JS
// JavaScript await oloClient.vision.unregisterCustomProvider('brightness_detector');

startVisionAnalysis(topic, providerName, config)

Start vision analysis on a camera topic using Appliance-side processing.

Parameters:

  • topic (string) - Camera topic to analyze (e.g., '/camera/image_raw/compressed')
  • providerName (string) - Name of vision provider to use
  • config (object, optional) - Analysis configuration
    • onResult (function) - Callback for analysis results
    • onError (function) - Callback for analysis errors
    • onStopped (function) - Callback when analysis is stopped
    • prompt (string) - Optional prompt for LLM providers
    • intervalMs (number) - Analysis interval in milliseconds (default: 500)
    • showBoundingBoxes (boolean) - Enable visual bounding boxes (for object detection)
    • boundingBoxOptions (object) - Options for bounding box rendering
    • videoElement (HTMLVideoElement) - Video element for bounding box rendering (required if showBoundingBoxes is true). Note: You must start the video stream with startVideo() before starting vision analysis for bounding boxes to display.
    • confidenceThreshold (number) - Per-call override for object detection confidence (0.0-1.0)
    • maxDetections (number) - Per-call override for max detections per frame (1-100)
    • maxImageWidth (number) - Per-call override for input image width

Returns: Promise<string> - Analysis session ID

JS
// Auto-detect best camera topic const { bestTopic } = await oloClient.video.detectVideoTopics(); // Start motion detection const motionAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'motion_detection', { intervalMs: 1000, onResult: (result) => { if (result.motion_detected) { console.log(`Motion level: ${result.motion_level}%`); } } }); // Start object detection with visual bounding boxes const objectAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', { intervalMs: 400, showBoundingBoxes: true, videoElement: videoElement, boundingBoxOptions: { strokeWidth: 3, fontSize: 14, showConfidence: true, showLabels: true }, // Per-call overrides (optional - these override provider defaults for this session): // confidenceThreshold: 0.4, // Lower threshold to detect more objects // maxDetections: 50, // Allow more detections per frame // maxImageWidth: 640, // Higher resolution for small objects onResult: (result) => { result.detections.forEach(detection => { console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`); }); } }); // Start OpenAI GPT-4 Vision analysis const aiAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'openai-gpt4v', { prompt: 'Describe what the robot should be aware of for navigation', intervalMs: 5000, onResult: (result) => { console.log('AI Analysis:', result.content); } }); // Start jsfeat shape detection with color (spheres, cubes, cuboids, etc.) // Uses Canny edge detection + contour analysis + fill ratio - runs locally, no ML model required const shapeAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'jsfeat_shapes', { intervalMs: 500, // Fast - no neural network inference showBoundingBoxes: true, videoElement: videoElement, // Optional per-call overrides: // minArea: 300, // Detect smaller shapes // cannyLowThreshold: 20, // Lower = more edges detected // cannyHighThreshold: 80, // Upper threshold for edge detection onResult: (result) => { result.detections.forEach(detection => { // detection.class includes color, e.g., "red sphere", "blue cube", "green cuboid" console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`); console.log(` Position: x=${detection.bbox.x}, y=${detection.bbox.y}`); console.log(` Size: ${detection.bbox.width}x${detection.bbox.height}`); console.log(` Shape: ${detection.properties.shape}, Color: ${detection.properties.color}`); }); } }); // Detectable shapes: sphere, cube, cuboid, triangle, ellipse, hexagon, polygon // Detectable colors: red, orange, yellow, green, cyan, blue, purple, pink, white, gray, black

stopVisionAnalysis(analysisId)

Stop a specific vision analysis session.

Parameters:

  • analysisId (string) - Analysis session ID returned from startVisionAnalysis()

Returns: Promise<Object> - Stop result with completion information

JS
await oloClient.vision.stopVisionAnalysis(analysisId);

stopAllVisionAnalysis()

Stop all active vision analysis sessions.

Returns: Promise<void> - Resolves when all sessions are stopped

JS
await oloClient.vision.stopAllVisionAnalysis();

getVisionAnalysisResults(analysisId, options)

Get analysis results from a session.

Parameters:

  • analysisId (string) - Analysis session ID
  • options (object, optional) - Options for result retrieval
    • limit (number) - Maximum number of results to return
    • since (number) - Only return results after this timestamp

Returns: Promise<Array> - Array of analysis results with timestamps

JS
const results = await oloClient.vision.getVisionAnalysisResults(analysisId, { limit: 10, since: Date.now() - 30000 // Last 30 seconds }); console.log('Recent results:', results);

getActiveVisionAnalysisSessions()

Get information about active analysis sessions from the Appliance.

Returns: Promise<Array> - Array of active session info objects

JS
const activeSessions = await oloClient.vision.getActiveVisionAnalysisSessions(); console.log('Active sessions:', activeSessions); // Returns: [{ analysisId: 'uuid', topic: '/camera/topic', provider: 'yolo', frameCount: 150, duration: '30s' }]

Bounding Box Visualization

The vision system supports real-time bounding box overlays on video feeds for object detection providers.

Detection Result Structure:

JS
{ type: 'object_detection', // Type of analysis provider: 'yolo', // Provider name timestamp: 1766503932894, // Timestamp of analysis frameNumber: 15, // Frame number in the analysis session detections: [ { class: 'person', // Object class name confidence: 0.87, // Confidence score (0.0 to 1.0) bbox: { // Bounding box coordinates (original image pixels) x: 150, y: 200, // Top-left corner width: 120, height: 180 // Dimensions } } ], totalObjects: 1, // Total number of detected objects confidenceThreshold: 0.5, // Confidence threshold used modelBase: 'mobilenet_v2', // Model used for detection maxDetections: 20, // Maximum detections setting processingTimeMs: 91, // Processing time in milliseconds inputFrame: { // Original input frame dimensions width: 640, height: 480 }, analysisInput: { // Dimensions used for analysis (may be downscaled) width: 480, height: 360 } }

Automatic Rendering:

JS
// Enable built-in bounding boxes with custom styling const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', { showBoundingBoxes: true, videoElement: videoElement, boundingBoxOptions: { strokeWidth: 3, fontSize: 14, showConfidence: true, showLabels: true }, onResult: (result) => { // Bounding boxes are automatically rendered on video console.log(`Detected ${result.detections.length} objects`); } });

Custom Rendering:

JS
// Disable automatic rendering to implement your own const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', { showBoundingBoxes: false, // Disable automatic rendering onResult: (result) => { // Use raw detection data for custom visualization result.detections.forEach(detection => { const { bbox, class: className, confidence } = detection; // Scale coordinates from image size to video display size const videoRect = videoElement.getBoundingClientRect(); const scaleX = videoRect.width / result.inputFrame.width; const scaleY = videoRect.height / result.inputFrame.height; const x = bbox.x * scaleX; const y = bbox.y * scaleY; const width = bbox.width * scaleX; const height = bbox.height * scaleY; // Draw your custom bounding box here drawCustomBoundingBox(x, y, width, height, className, confidence); }); } });

Important: Detection coordinates are in original image pixels, not video display pixels. You must scale them to match your video element's display size.

Performance Optimization

The vision system includes several performance optimizations for real-time analysis:

CPU Optimization:

  • Image downscaling before inference with coordinate scaling back to original size
  • Configurable maximum input image width via configureVisionProvider()
  • Non-overlapping analysis processing to prevent CPU overload
  • TensorFlow.js memory management with automatic tensor cleanup

Network Optimization:

  • Appliance-side processing reduces client CPU usage
  • Throttled ROS topic subscriptions to balance frame rate and performance
  • Configurable analysis intervals to control processing frequency
JS
// Optimize for low-CPU devices await oloClient.vision.configureVisionProvider('yolo', { maxImageWidth: 320, // Smaller input for faster processing confidenceThreshold: 0.7 // Higher threshold for fewer false positives }); // Use longer intervals for better performance const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', { intervalMs: 600, // Analyze every 600ms instead of default 500ms showBoundingBoxes: true, videoElement: videoElement });

Vision-Based Robot Control Example

JS
// Stop robot when person is detected using camera topic directly const { bestTopic } = await oloClient.video.detectVideoTopics(); await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', { intervalMs: 500, onResult: async (result) => { const people = result.detections.filter(d => d.class === 'person'); if (people.length > 0) { await oloClient.core.stopRobot(); console.log('Person detected - stopping robot for safety'); } } });