Vision Analysis

The vision module provides AI-powered computer vision analysis of robot camera feeds using processing on the Appliance. Analysis is performed using multiple AI providers including OpenAI GPT-4 Vision, COCO-SSD object detection, motion detection, and geometric shape detection with color recognition.

`configureVisionProvider(providerName, config)`

Configure a vision provider on the Appliance (e.g., set API key).

Parameters:

providerName (string) - Provider name (e.g., 'openai-gpt4v', 'yolo', 'motion_detection')
config (object) - Provider configuration
- apiKey (string) - API key for providers that require it
- model (string) - Model name for AI providers
- maxTokens (number) - Max tokens for AI providers
- sensitivity (number) - Sensitivity for motion detection (1-100, default: 15)
- confidenceThreshold (number) - Confidence threshold for object detection (0.0-1.0, default: 0.5)
- maxImageWidth (number) - Maximum input image width for performance optimization (default: 480)
- modelBase (string) - YOLO model base: 'lite_mobilenet_v2' (fast, default), 'mobilenet_v1' (balanced), 'mobilenet_v2' (accurate)
- maxDetections (number) - Maximum detections per frame (1-100, default: 20)

Returns: Promise<boolean> - True if provider was configured successfully

YOLO Configuration Options:

Option	Default	Description
`modelBase`	'lite_mobilenet_v2'	Model architecture: 'lite_mobilenet_v2' (fastest), 'mobilenet_v1' (balanced), 'mobilenet_v2' (most accurate)
`confidenceThreshold`	0.5	Minimum confidence score (0.25-0.35 for more detections, 0.5+ for fewer false positives)
`maxDetections`	20	Maximum objects detected per frame (up to 100 for crowded scenes)
`maxImageWidth`	480	Input image width (higher = better small object detection but slower, 640-800 for accuracy)

// Configure OpenAI provider with API key
await oloClient.vision.configureVisionProvider('openai-gpt4v', {
    apiKey: 'your-openai-api-key',
    model: 'gpt-4o',
    maxTokens: 300
});

// Configure YOLO for maximum accuracy (slower)
await oloClient.vision.configureVisionProvider('yolo', {
    modelBase: 'mobilenet_v2',     // Most accurate model
    confidenceThreshold: 0.35,     // Lower threshold = more detections
    maxDetections: 50,             // More objects per frame
    maxImageWidth: 640             // Higher resolution for small objects
});

// Configure YOLO for maximum speed (default settings)
await oloClient.vision.configureVisionProvider('yolo', {
    modelBase: 'lite_mobilenet_v2', // Fastest model (default)
    confidenceThreshold: 0.5,       // Balanced threshold (default)
    maxDetections: 20,              // Standard limit (default)
    maxImageWidth: 480              // Fast processing (default)
});

// Configure motion detection sensitivity
await oloClient.vision.configureVisionProvider('motion_detection', {
    sensitivity: 15
});

// Configure jsfeat shape detection (contour analysis + fill ratio, runs locally)
await oloClient.vision.configureVisionProvider('jsfeat_shapes', {
    minArea: 400,              // Minimum shape area in pixels (filters noise)
    cannyLowThreshold: 25,     // Canny edge detection low threshold
    cannyHighThreshold: 100,   // Canny edge detection high threshold
    analysisWidth: 280         // Image width for processing
});
// Detects 3D objects: spheres, cubes, cuboids with color (red, blue, green, etc.)

JSFeat Shape Detection Options:

Option	Default	Description
`minArea`	400	Minimum contour area in pixels (filters small noise)
`maxAreaFraction`	0.8	Maximum shape area as fraction of image (filters full-frame)
`cannyLowThreshold`	25	Canny edge detection low threshold
`cannyHighThreshold`	100	Canny edge detection high threshold
`analysisWidth`	280	Image width for processing (lower = faster)

Detected Shape Classes (with color prefix): Detection classes include color, e.g., "red sphere", "blue cube", "green cuboid"

sphere - Circular/spherical objects (high circularity, ~1:1 aspect ratio)
ellipse - Oval/oblong shapes (high circularity, non-square aspect ratio)
cube - Square-ish 3D objects (4-5 vertices, ~1:1 aspect ratio)
cuboid - Rectangular 3D objects (4-7 vertices, rectangular aspect ratio)
triangle - 3-vertex shapes
hexagon - 6-vertex shapes
polygon - Irregular shapes with 8+ vertices

Detected Colors:

red, orange, yellow, green, cyan, blue, purple, pink
white, gray, black

Detection Properties: Each detection includes additional properties:

{
    class: "red sphere",           // Color + shape
    confidence: 0.85,
    bbox: { x, y, width, height }, // Bounding box coordinates
    properties: {
        shape: "sphere",           // Shape only
        color: "red",              // Color only  
        vertices: 12,              // Simplified polygon vertices
        circularity: 0.92,         // 0.0-1.0, higher = more circular
        area: 1250,                // Area in pixels
        aspectRatio: 1.02          // Width/height ratio
    }
}

`getAvailableVisionProviders()`

Get available vision providers and their capabilities from the Appliance.

Returns: Promise<Array> - Array of provider info objects

const providers = await oloClient.vision.getAvailableVisionProviders();
console.log('Available providers:', providers);
// Returns: [{ name: 'yolo', displayName: 'COCO-SSD Object Detection', capabilities: [...], requiresApiKey: false, isConfigured: true, isCustom: false }]

`registerCustomProvider(providerName, providerConfig)`

Register a custom vision provider with user-defined analysis logic. The provider will persist in appliance memory until unregistered or appliance restart. Custom providers can be used with startVisionAnalysis() just like built-in providers.

Parameters:

providerName (string) - Unique name for the provider (cannot override built-in providers: 'yolo', 'motion_detection', 'openai-gpt4v', 'jsfeat_shapes')
providerConfig (object) - Provider configuration
- analyzeFrame (function) - Analysis function: async (imageBuffer, config) => result
  - imageBuffer (Buffer/bytes) - JPEG image data
  - config (object) - Configuration passed from startVisionAnalysis
  - Returns: An object with analysis results
- displayName (string, optional) - Human-readable name for the provider
- description (string, optional) - Description of what the provider does
- capabilities (string[], optional) - List of capabilities (e.g., ['object_detection', 'bounding_boxes'])

Returns: Promise<boolean> - True if provider was registered successfully

Notes:

JavaScript sandbox globals: Buffer, sharp (image processing), Math, JSON, Date, console, Promise, setTimeout, setImmediate
Python: Functions are executed in-process in SDK Playground. For external scripts, the function must be defined in a source file. Any Python packages installed on the appliance can be used (PIL, OpenCV, numpy, etc.).

Example - Brightness Detection:

// Register a custom brightness detector
await oloClient.vision.registerCustomProvider('brightness_detector', {
    displayName: 'Brightness Detector',
    description: 'Analyzes average image brightness',
    capabilities: ['custom_analysis'],
    
    analyzeFrame: async function(imageBuffer, config) {
        // Use sharp (available in sandbox) to process image
        const { data, info } = await sharp(imageBuffer)
            .resize(160, 120)
            .grayscale()
            .raw()
            .toBuffer({ resolveWithObject: true });
        
        // Calculate average brightness
        let total = 0;
        for (let i = 0; i < data.length; i++) {
            total += data[i];
        }
        const avgBrightness = total / data.length;
        
        return {
            brightness: avgBrightness,
            isLight: avgBrightness > 128,
            message: avgBrightness > 128 ? 'Scene is well lit' : 'Scene is dark'
        };
    }
});

// Use the custom provider like any built-in provider
const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'brightness_detector', {
    intervalMs: 1000,
    onResult: (result) => {
        console.log(`Brightness: ${result.brightness.toFixed(1)}, ${result.message}`);
    }
});

Example - Color Region Detection with Bounding Boxes:

// Register a custom color detector that returns bounding boxes
await oloClient.vision.registerCustomProvider('red_detector', {
    displayName: 'Red Region Detector',
    description: 'Detects red-colored regions in the image',
    capabilities: ['object_detection', 'bounding_boxes'],
    
    analyzeFrame: async function(imageBuffer, config) {
        const { data, info } = await sharp(imageBuffer)
            .resize(320, 240)
            .raw()
            .toBuffer({ resolveWithObject: true });
        
        const detections = [];
        const width = info.width;
        const height = info.height;
        const channels = info.channels;
        
        // Grid-based red region detection
        const gridSize = 32;
        for (let gy = 0; gy < height; gy += gridSize) {
            for (let gx = 0; gx < width; gx += gridSize) {
                let redCount = 0;
                let totalPixels = 0;
                
                for (let y = gy; y < Math.min(gy + gridSize, height); y++) {
                    for (let x = gx; x < Math.min(gx + gridSize, width); x++) {
                        const idx = (y * width + x) * channels;
                        const r = data[idx];
                        const g = data[idx + 1];
                        const b = data[idx + 2];
                        
                        // Check if pixel is "red"
                        if (r > 150 && r > g * 1.5 && r > b * 1.5) {
                            redCount++;
                        }
                        totalPixels++;
                    }
                }
                
                // If more than 30% of cell is red, add detection
                if (redCount / totalPixels > 0.3) {
                    detections.push({
                        class: 'red_region',
                        confidence: redCount / totalPixels,
                        bbox: {
                            x: gx,
                            y: gy,
                            width: gridSize,
                            height: gridSize
                        }
                    });
                }
            }
        }
        
        // Return object_detection type to enable bounding box rendering
        return {
            type: 'object_detection',
            detections: detections,
            inputFrame: { width, height }
        };
    }
});

// Use with bounding boxes
const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'red_detector', {
    intervalMs: 500,
    showBoundingBoxes: true,
    videoElement: videoElement,
    onResult: (result) => {
        if (result.detections.length > 0) {
            console.log(`Detected ${result.detections.length} red region(s)`);
        }
    }
});

`unregisterCustomProvider(providerName)`

Unregister a custom vision provider. Cannot unregister built-in providers or providers currently in use.

Parameters:

providerName (string) - Name of the provider to unregister

Returns: Promise<boolean> - True if unregistered successfully

// JavaScript
await oloClient.vision.unregisterCustomProvider('brightness_detector');

`startVisionAnalysis(topic, providerName, config)`

Start vision analysis on a camera topic using Appliance-side processing.

Parameters:

topic (string) - Camera topic to analyze (e.g., '/camera/image_raw/compressed')
providerName (string) - Name of vision provider to use
config (object, optional) - Analysis configuration
- onResult (function) - Callback for analysis results
- onError (function) - Callback for analysis errors
- onStopped (function) - Callback when analysis is stopped
- prompt (string) - Optional prompt for LLM providers
- intervalMs (number) - Analysis interval in milliseconds (default: 500)
- showBoundingBoxes (boolean) - Enable visual bounding boxes (for object detection)
- boundingBoxOptions (object) - Options for bounding box rendering
- videoElement (HTMLVideoElement) - Video element for bounding box rendering (required if showBoundingBoxes is true). Note: You must start the video stream with startVideo() before starting vision analysis for bounding boxes to display.
- confidenceThreshold (number) - Per-call override for object detection confidence (0.0-1.0)
- maxDetections (number) - Per-call override for max detections per frame (1-100)
- maxImageWidth (number) - Per-call override for input image width

Returns: Promise<string> - Analysis session ID

// Auto-detect best camera topic
const { bestTopic } = await oloClient.video.detectVideoTopics();

// Start motion detection
const motionAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'motion_detection', {
    intervalMs: 1000,
    onResult: (result) => {
        if (result.motion_detected) {
            console.log(`Motion level: ${result.motion_level}%`);
        }
    }
});

// Start object detection with visual bounding boxes
const objectAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', {
    intervalMs: 400,
    showBoundingBoxes: true,
    videoElement: videoElement,
    boundingBoxOptions: {
        strokeWidth: 3,
        fontSize: 14,
        showConfidence: true,
        showLabels: true
    },
    // Per-call overrides (optional - these override provider defaults for this session):
    // confidenceThreshold: 0.4,  // Lower threshold to detect more objects
    // maxDetections: 50,         // Allow more detections per frame
    // maxImageWidth: 640,        // Higher resolution for small objects
    onResult: (result) => {
        result.detections.forEach(detection => {
            console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`);
        });
    }
});

// Start OpenAI GPT-4 Vision analysis
const aiAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'openai-gpt4v', {
    prompt: 'Describe what the robot should be aware of for navigation',
    intervalMs: 5000,
    onResult: (result) => {
        console.log('AI Analysis:', result.content);
    }
});

// Start jsfeat shape detection with color (spheres, cubes, cuboids, etc.)
// Uses Canny edge detection + contour analysis + fill ratio - runs locally, no ML model required
const shapeAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'jsfeat_shapes', {
    intervalMs: 500,  // Fast - no neural network inference
    showBoundingBoxes: true,
    videoElement: videoElement,
    // Optional per-call overrides:
    // minArea: 300,              // Detect smaller shapes
    // cannyLowThreshold: 20,     // Lower = more edges detected
    // cannyHighThreshold: 80,    // Upper threshold for edge detection
    onResult: (result) => {
        result.detections.forEach(detection => {
            // detection.class includes color, e.g., "red sphere", "blue cube", "green cuboid"
            console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`);
            console.log(`  Position: x=${detection.bbox.x}, y=${detection.bbox.y}`);
            console.log(`  Size: ${detection.bbox.width}x${detection.bbox.height}`);
            console.log(`  Shape: ${detection.properties.shape}, Color: ${detection.properties.color}`);
        });
    }
});
// Detectable shapes: sphere, cube, cuboid, triangle, ellipse, hexagon, polygon
// Detectable colors: red, orange, yellow, green, cyan, blue, purple, pink, white, gray, black

`stopVisionAnalysis(analysisId)`

Stop a specific vision analysis session.

Parameters:

analysisId (string) - Analysis session ID returned from startVisionAnalysis()

Returns: Promise<Object> - Stop result with completion information

await oloClient.vision.stopVisionAnalysis(analysisId);

`stopAllVisionAnalysis()`

Stop all active vision analysis sessions.

Returns: Promise<void> - Resolves when all sessions are stopped

await oloClient.vision.stopAllVisionAnalysis();

`getVisionAnalysisResults(analysisId, options)`

Get analysis results from a session.

Parameters:

analysisId (string) - Analysis session ID
options (object, optional) - Options for result retrieval
- limit (number) - Maximum number of results to return
- since (number) - Only return results after this timestamp

Returns: Promise<Array> - Array of analysis results with timestamps

const results = await oloClient.vision.getVisionAnalysisResults(analysisId, {
    limit: 10,
    since: Date.now() - 30000  // Last 30 seconds
});
console.log('Recent results:', results);

`getActiveVisionAnalysisSessions()`

Get information about active analysis sessions from the Appliance.

Returns: Promise<Array> - Array of active session info objects

const activeSessions = await oloClient.vision.getActiveVisionAnalysisSessions();
console.log('Active sessions:', activeSessions);
// Returns: [{ analysisId: 'uuid', topic: '/camera/topic', provider: 'yolo', frameCount: 150, duration: '30s' }]

Bounding Box Visualization

The vision system supports real-time bounding box overlays on video feeds for object detection providers.

Detection Result Structure:

{
    type: 'object_detection',          // Type of analysis
    provider: 'yolo',                  // Provider name
    timestamp: 1766503932894,          // Timestamp of analysis
    frameNumber: 15,                   // Frame number in the analysis session
    detections: [
        {
            class: 'person',           // Object class name
            confidence: 0.87,          // Confidence score (0.0 to 1.0)
            bbox: {                    // Bounding box coordinates (original image pixels)
                x: 150, y: 200,        // Top-left corner
                width: 120, height: 180 // Dimensions
            }
        }
    ],
    totalObjects: 1,                   // Total number of detected objects
    confidenceThreshold: 0.5,          // Confidence threshold used
    modelBase: 'mobilenet_v2',         // Model used for detection
    maxDetections: 20,                 // Maximum detections setting
    processingTimeMs: 91,              // Processing time in milliseconds
    inputFrame: {                      // Original input frame dimensions
        width: 640,
        height: 480
    },
    analysisInput: {                   // Dimensions used for analysis (may be downscaled)
        width: 480,
        height: 360
    }
}

Automatic Rendering:

// Enable built-in bounding boxes with custom styling
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
    showBoundingBoxes: true,
    videoElement: videoElement,
    boundingBoxOptions: {
        strokeWidth: 3,
        fontSize: 14,
        showConfidence: true,
        showLabels: true
    },
    onResult: (result) => {
        // Bounding boxes are automatically rendered on video
        console.log(`Detected ${result.detections.length} objects`);
    }
});

Custom Rendering:

// Disable automatic rendering to implement your own
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
    showBoundingBoxes: false,  // Disable automatic rendering
    onResult: (result) => {
        // Use raw detection data for custom visualization
        result.detections.forEach(detection => {
            const { bbox, class: className, confidence } = detection;
            
            // Scale coordinates from image size to video display size
            const videoRect = videoElement.getBoundingClientRect();
            const scaleX = videoRect.width / result.inputFrame.width;
            const scaleY = videoRect.height / result.inputFrame.height;
            
            const x = bbox.x * scaleX;
            const y = bbox.y * scaleY;
            const width = bbox.width * scaleX;
            const height = bbox.height * scaleY;
            
            // Draw your custom bounding box here
            drawCustomBoundingBox(x, y, width, height, className, confidence);
        });
    }
});

Important: Detection coordinates are in original image pixels, not video display pixels. You must scale them to match your video element's display size.

Performance Optimization

The vision system includes several performance optimizations for real-time analysis:

CPU Optimization:

Image downscaling before inference with coordinate scaling back to original size
Configurable maximum input image width via configureVisionProvider()
Non-overlapping analysis processing to prevent CPU overload
TensorFlow.js memory management with automatic tensor cleanup

Network Optimization:

Appliance-side processing reduces client CPU usage
Throttled ROS topic subscriptions to balance frame rate and performance
Configurable analysis intervals to control processing frequency

// Optimize for low-CPU devices
await oloClient.vision.configureVisionProvider('yolo', {
    maxImageWidth: 320,          // Smaller input for faster processing
    confidenceThreshold: 0.7     // Higher threshold for fewer false positives
});

// Use longer intervals for better performance
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
    intervalMs: 600,  // Analyze every 600ms instead of default 500ms
    showBoundingBoxes: true,
    videoElement: videoElement
});

Vision-Based Robot Control Example

// Stop robot when person is detected using camera topic directly
const { bestTopic } = await oloClient.video.detectVideoTopics();

await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', {
    intervalMs: 500,
    onResult: async (result) => {
        const people = result.detections.filter(d => d.class === 'person');
        if (people.length > 0) {
            await oloClient.core.stopRobot();
            console.log('Person detected - stopping robot for safety');
        }
    }
});

On This Page

Getting Started

API Reference

Standalone Client Development

Vision Analysis

`configureVisionProvider(providerName, config)`

`getAvailableVisionProviders()`

`registerCustomProvider(providerName, providerConfig)`

`unregisterCustomProvider(providerName)`

`startVisionAnalysis(topic, providerName, config)`

`stopVisionAnalysis(analysisId)`

`stopAllVisionAnalysis()`

`getVisionAnalysisResults(analysisId, options)`

`getActiveVisionAnalysisSessions()`

Bounding Box Visualization

Performance Optimization

Vision-Based Robot Control Example