Vision Analysis
The vision module provides AI-powered computer vision analysis of robot camera feeds using processing on the Appliance. Analysis is performed using multiple AI providers including OpenAI GPT-4 Vision, COCO-SSD object detection, motion detection, and geometric shape detection with color recognition.
configureVisionProvider(providerName, config)
Configure a vision provider on the Appliance (e.g., set API key).
Parameters:
providerName(string) - Provider name (e.g., 'openai-gpt4v', 'yolo', 'motion_detection')config(object) - Provider configurationapiKey(string) - API key for providers that require itmodel(string) - Model name for AI providersmaxTokens(number) - Max tokens for AI providerssensitivity(number) - Sensitivity for motion detection (1-100, default: 15)confidenceThreshold(number) - Confidence threshold for object detection (0.0-1.0, default: 0.5)maxImageWidth(number) - Maximum input image width for performance optimization (default: 480)modelBase(string) - YOLO model base: 'lite_mobilenet_v2' (fast, default), 'mobilenet_v1' (balanced), 'mobilenet_v2' (accurate)maxDetections(number) - Maximum detections per frame (1-100, default: 20)
Returns: Promise<boolean> - True if provider was configured successfully
YOLO Configuration Options:
| Option | Default | Description |
|---|---|---|
modelBase | 'lite_mobilenet_v2' | Model architecture: 'lite_mobilenet_v2' (fastest), 'mobilenet_v1' (balanced), 'mobilenet_v2' (most accurate) |
confidenceThreshold | 0.5 | Minimum confidence score (0.25-0.35 for more detections, 0.5+ for fewer false positives) |
maxDetections | 20 | Maximum objects detected per frame (up to 100 for crowded scenes) |
maxImageWidth | 480 | Input image width (higher = better small object detection but slower, 640-800 for accuracy) |
// Configure OpenAI provider with API key
await oloClient.vision.configureVisionProvider('openai-gpt4v', {
apiKey: 'your-openai-api-key',
model: 'gpt-4o',
maxTokens: 300
});
// Configure YOLO for maximum accuracy (slower)
await oloClient.vision.configureVisionProvider('yolo', {
modelBase: 'mobilenet_v2', // Most accurate model
confidenceThreshold: 0.35, // Lower threshold = more detections
maxDetections: 50, // More objects per frame
maxImageWidth: 640 // Higher resolution for small objects
});
// Configure YOLO for maximum speed (default settings)
await oloClient.vision.configureVisionProvider('yolo', {
modelBase: 'lite_mobilenet_v2', // Fastest model (default)
confidenceThreshold: 0.5, // Balanced threshold (default)
maxDetections: 20, // Standard limit (default)
maxImageWidth: 480 // Fast processing (default)
});
// Configure motion detection sensitivity
await oloClient.vision.configureVisionProvider('motion_detection', {
sensitivity: 15
});
// Configure jsfeat shape detection (contour analysis + fill ratio, runs locally)
await oloClient.vision.configureVisionProvider('jsfeat_shapes', {
minArea: 400, // Minimum shape area in pixels (filters noise)
cannyLowThreshold: 25, // Canny edge detection low threshold
cannyHighThreshold: 100, // Canny edge detection high threshold
analysisWidth: 280 // Image width for processing
});
// Detects 3D objects: spheres, cubes, cuboids with color (red, blue, green, etc.)JSFeat Shape Detection Options:
| Option | Default | Description |
|---|---|---|
minArea | 400 | Minimum contour area in pixels (filters small noise) |
maxAreaFraction | 0.8 | Maximum shape area as fraction of image (filters full-frame) |
cannyLowThreshold | 25 | Canny edge detection low threshold |
cannyHighThreshold | 100 | Canny edge detection high threshold |
analysisWidth | 280 | Image width for processing (lower = faster) |
Detected Shape Classes (with color prefix): Detection classes include color, e.g., "red sphere", "blue cube", "green cuboid"
sphere- Circular/spherical objects (high circularity, ~1:1 aspect ratio)ellipse- Oval/oblong shapes (high circularity, non-square aspect ratio)cube- Square-ish 3D objects (4-5 vertices, ~1:1 aspect ratio)cuboid- Rectangular 3D objects (4-7 vertices, rectangular aspect ratio)triangle- 3-vertex shapeshexagon- 6-vertex shapespolygon- Irregular shapes with 8+ vertices
Detected Colors:
red,orange,yellow,green,cyan,blue,purple,pinkwhite,gray,black
Detection Properties: Each detection includes additional properties:
{
class: "red sphere", // Color + shape
confidence: 0.85,
bbox: { x, y, width, height }, // Bounding box coordinates
properties: {
shape: "sphere", // Shape only
color: "red", // Color only
vertices: 12, // Simplified polygon vertices
circularity: 0.92, // 0.0-1.0, higher = more circular
area: 1250, // Area in pixels
aspectRatio: 1.02 // Width/height ratio
}
}getAvailableVisionProviders()
Get available vision providers and their capabilities from the Appliance.
Returns: Promise<Array> - Array of provider info objects
const providers = await oloClient.vision.getAvailableVisionProviders();
console.log('Available providers:', providers);
// Returns: [{ name: 'yolo', displayName: 'COCO-SSD Object Detection', capabilities: [...], requiresApiKey: false, isConfigured: true, isCustom: false }]registerCustomProvider(providerName, providerConfig)
Register a custom vision provider with user-defined analysis logic. The provider will persist in appliance memory until unregistered or appliance restart. Custom providers can be used with startVisionAnalysis() just like built-in providers.
Parameters:
providerName(string) - Unique name for the provider (cannot override built-in providers: 'yolo', 'motion_detection', 'openai-gpt4v', 'jsfeat_shapes')providerConfig(object) - Provider configurationanalyzeFrame(function) - Analysis function:async (imageBuffer, config) => resultimageBuffer(Buffer/bytes) - JPEG image dataconfig(object) - Configuration passed fromstartVisionAnalysis- Returns: An object with analysis results
displayName(string, optional) - Human-readable name for the providerdescription(string, optional) - Description of what the provider doescapabilities(string[], optional) - List of capabilities (e.g.,['object_detection', 'bounding_boxes'])
Returns: Promise<boolean> - True if provider was registered successfully
Notes:
- JavaScript sandbox globals:
Buffer,sharp(image processing),Math,JSON,Date,console,Promise,setTimeout,setImmediate - Python: Functions are executed in-process in SDK Playground. For external scripts, the function must be defined in a source file. Any Python packages installed on the appliance can be used (PIL, OpenCV, numpy, etc.).
Example - Brightness Detection:
// Register a custom brightness detector
await oloClient.vision.registerCustomProvider('brightness_detector', {
displayName: 'Brightness Detector',
description: 'Analyzes average image brightness',
capabilities: ['custom_analysis'],
analyzeFrame: async function(imageBuffer, config) {
// Use sharp (available in sandbox) to process image
const { data, info } = await sharp(imageBuffer)
.resize(160, 120)
.grayscale()
.raw()
.toBuffer({ resolveWithObject: true });
// Calculate average brightness
let total = 0;
for (let i = 0; i < data.length; i++) {
total += data[i];
}
const avgBrightness = total / data.length;
return {
brightness: avgBrightness,
isLight: avgBrightness > 128,
message: avgBrightness > 128 ? 'Scene is well lit' : 'Scene is dark'
};
}
});
// Use the custom provider like any built-in provider
const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'brightness_detector', {
intervalMs: 1000,
onResult: (result) => {
console.log(`Brightness: ${result.brightness.toFixed(1)}, ${result.message}`);
}
});Example - Color Region Detection with Bounding Boxes:
// Register a custom color detector that returns bounding boxes
await oloClient.vision.registerCustomProvider('red_detector', {
displayName: 'Red Region Detector',
description: 'Detects red-colored regions in the image',
capabilities: ['object_detection', 'bounding_boxes'],
analyzeFrame: async function(imageBuffer, config) {
const { data, info } = await sharp(imageBuffer)
.resize(320, 240)
.raw()
.toBuffer({ resolveWithObject: true });
const detections = [];
const width = info.width;
const height = info.height;
const channels = info.channels;
// Grid-based red region detection
const gridSize = 32;
for (let gy = 0; gy < height; gy += gridSize) {
for (let gx = 0; gx < width; gx += gridSize) {
let redCount = 0;
let totalPixels = 0;
for (let y = gy; y < Math.min(gy + gridSize, height); y++) {
for (let x = gx; x < Math.min(gx + gridSize, width); x++) {
const idx = (y * width + x) * channels;
const r = data[idx];
const g = data[idx + 1];
const b = data[idx + 2];
// Check if pixel is "red"
if (r > 150 && r > g * 1.5 && r > b * 1.5) {
redCount++;
}
totalPixels++;
}
}
// If more than 30% of cell is red, add detection
if (redCount / totalPixels > 0.3) {
detections.push({
class: 'red_region',
confidence: redCount / totalPixels,
bbox: {
x: gx,
y: gy,
width: gridSize,
height: gridSize
}
});
}
}
}
// Return object_detection type to enable bounding box rendering
return {
type: 'object_detection',
detections: detections,
inputFrame: { width, height }
};
}
});
// Use with bounding boxes
const analysisId = await oloClient.vision.startVisionAnalysis(cameraTopic, 'red_detector', {
intervalMs: 500,
showBoundingBoxes: true,
videoElement: videoElement,
onResult: (result) => {
if (result.detections.length > 0) {
console.log(`Detected ${result.detections.length} red region(s)`);
}
}
});unregisterCustomProvider(providerName)
Unregister a custom vision provider. Cannot unregister built-in providers or providers currently in use.
Parameters:
providerName(string) - Name of the provider to unregister
Returns: Promise<boolean> - True if unregistered successfully
// JavaScript
await oloClient.vision.unregisterCustomProvider('brightness_detector');startVisionAnalysis(topic, providerName, config)
Start vision analysis on a camera topic using Appliance-side processing.
Parameters:
topic(string) - Camera topic to analyze (e.g., '/camera/image_raw/compressed')providerName(string) - Name of vision provider to useconfig(object, optional) - Analysis configurationonResult(function) - Callback for analysis resultsonError(function) - Callback for analysis errorsonStopped(function) - Callback when analysis is stoppedprompt(string) - Optional prompt for LLM providersintervalMs(number) - Analysis interval in milliseconds (default: 500)showBoundingBoxes(boolean) - Enable visual bounding boxes (for object detection)boundingBoxOptions(object) - Options for bounding box renderingvideoElement(HTMLVideoElement) - Video element for bounding box rendering (required if showBoundingBoxes is true). Note: You must start the video stream withstartVideo()before starting vision analysis for bounding boxes to display.confidenceThreshold(number) - Per-call override for object detection confidence (0.0-1.0)maxDetections(number) - Per-call override for max detections per frame (1-100)maxImageWidth(number) - Per-call override for input image width
Returns: Promise<string> - Analysis session ID
// Auto-detect best camera topic
const { bestTopic } = await oloClient.video.detectVideoTopics();
// Start motion detection
const motionAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'motion_detection', {
intervalMs: 1000,
onResult: (result) => {
if (result.motion_detected) {
console.log(`Motion level: ${result.motion_level}%`);
}
}
});
// Start object detection with visual bounding boxes
const objectAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', {
intervalMs: 400,
showBoundingBoxes: true,
videoElement: videoElement,
boundingBoxOptions: {
strokeWidth: 3,
fontSize: 14,
showConfidence: true,
showLabels: true
},
// Per-call overrides (optional - these override provider defaults for this session):
// confidenceThreshold: 0.4, // Lower threshold to detect more objects
// maxDetections: 50, // Allow more detections per frame
// maxImageWidth: 640, // Higher resolution for small objects
onResult: (result) => {
result.detections.forEach(detection => {
console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`);
});
}
});
// Start OpenAI GPT-4 Vision analysis
const aiAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'openai-gpt4v', {
prompt: 'Describe what the robot should be aware of for navigation',
intervalMs: 5000,
onResult: (result) => {
console.log('AI Analysis:', result.content);
}
});
// Start jsfeat shape detection with color (spheres, cubes, cuboids, etc.)
// Uses Canny edge detection + contour analysis + fill ratio - runs locally, no ML model required
const shapeAnalysisId = await oloClient.vision.startVisionAnalysis(bestTopic, 'jsfeat_shapes', {
intervalMs: 500, // Fast - no neural network inference
showBoundingBoxes: true,
videoElement: videoElement,
// Optional per-call overrides:
// minArea: 300, // Detect smaller shapes
// cannyLowThreshold: 20, // Lower = more edges detected
// cannyHighThreshold: 80, // Upper threshold for edge detection
onResult: (result) => {
result.detections.forEach(detection => {
// detection.class includes color, e.g., "red sphere", "blue cube", "green cuboid"
console.log(`Found ${detection.class} (${(detection.confidence * 100).toFixed(1)}%)`);
console.log(` Position: x=${detection.bbox.x}, y=${detection.bbox.y}`);
console.log(` Size: ${detection.bbox.width}x${detection.bbox.height}`);
console.log(` Shape: ${detection.properties.shape}, Color: ${detection.properties.color}`);
});
}
});
// Detectable shapes: sphere, cube, cuboid, triangle, ellipse, hexagon, polygon
// Detectable colors: red, orange, yellow, green, cyan, blue, purple, pink, white, gray, blackstopVisionAnalysis(analysisId)
Stop a specific vision analysis session.
Parameters:
analysisId(string) - Analysis session ID returned from startVisionAnalysis()
Returns: Promise<Object> - Stop result with completion information
await oloClient.vision.stopVisionAnalysis(analysisId);stopAllVisionAnalysis()
Stop all active vision analysis sessions.
Returns: Promise<void> - Resolves when all sessions are stopped
await oloClient.vision.stopAllVisionAnalysis();getVisionAnalysisResults(analysisId, options)
Get analysis results from a session.
Parameters:
analysisId(string) - Analysis session IDoptions(object, optional) - Options for result retrievallimit(number) - Maximum number of results to returnsince(number) - Only return results after this timestamp
Returns: Promise<Array> - Array of analysis results with timestamps
const results = await oloClient.vision.getVisionAnalysisResults(analysisId, {
limit: 10,
since: Date.now() - 30000 // Last 30 seconds
});
console.log('Recent results:', results);getActiveVisionAnalysisSessions()
Get information about active analysis sessions from the Appliance.
Returns: Promise<Array> - Array of active session info objects
const activeSessions = await oloClient.vision.getActiveVisionAnalysisSessions();
console.log('Active sessions:', activeSessions);
// Returns: [{ analysisId: 'uuid', topic: '/camera/topic', provider: 'yolo', frameCount: 150, duration: '30s' }]Bounding Box Visualization
The vision system supports real-time bounding box overlays on video feeds for object detection providers.
Detection Result Structure:
{
type: 'object_detection', // Type of analysis
provider: 'yolo', // Provider name
timestamp: 1766503932894, // Timestamp of analysis
frameNumber: 15, // Frame number in the analysis session
detections: [
{
class: 'person', // Object class name
confidence: 0.87, // Confidence score (0.0 to 1.0)
bbox: { // Bounding box coordinates (original image pixels)
x: 150, y: 200, // Top-left corner
width: 120, height: 180 // Dimensions
}
}
],
totalObjects: 1, // Total number of detected objects
confidenceThreshold: 0.5, // Confidence threshold used
modelBase: 'mobilenet_v2', // Model used for detection
maxDetections: 20, // Maximum detections setting
processingTimeMs: 91, // Processing time in milliseconds
inputFrame: { // Original input frame dimensions
width: 640,
height: 480
},
analysisInput: { // Dimensions used for analysis (may be downscaled)
width: 480,
height: 360
}
}Automatic Rendering:
// Enable built-in bounding boxes with custom styling
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
showBoundingBoxes: true,
videoElement: videoElement,
boundingBoxOptions: {
strokeWidth: 3,
fontSize: 14,
showConfidence: true,
showLabels: true
},
onResult: (result) => {
// Bounding boxes are automatically rendered on video
console.log(`Detected ${result.detections.length} objects`);
}
});Custom Rendering:
// Disable automatic rendering to implement your own
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
showBoundingBoxes: false, // Disable automatic rendering
onResult: (result) => {
// Use raw detection data for custom visualization
result.detections.forEach(detection => {
const { bbox, class: className, confidence } = detection;
// Scale coordinates from image size to video display size
const videoRect = videoElement.getBoundingClientRect();
const scaleX = videoRect.width / result.inputFrame.width;
const scaleY = videoRect.height / result.inputFrame.height;
const x = bbox.x * scaleX;
const y = bbox.y * scaleY;
const width = bbox.width * scaleX;
const height = bbox.height * scaleY;
// Draw your custom bounding box here
drawCustomBoundingBox(x, y, width, height, className, confidence);
});
}
});Important: Detection coordinates are in original image pixels, not video display pixels. You must scale them to match your video element's display size.
Performance Optimization
The vision system includes several performance optimizations for real-time analysis:
CPU Optimization:
- Image downscaling before inference with coordinate scaling back to original size
- Configurable maximum input image width via
configureVisionProvider() - Non-overlapping analysis processing to prevent CPU overload
- TensorFlow.js memory management with automatic tensor cleanup
Network Optimization:
- Appliance-side processing reduces client CPU usage
- Throttled ROS topic subscriptions to balance frame rate and performance
- Configurable analysis intervals to control processing frequency
// Optimize for low-CPU devices
await oloClient.vision.configureVisionProvider('yolo', {
maxImageWidth: 320, // Smaller input for faster processing
confidenceThreshold: 0.7 // Higher threshold for fewer false positives
});
// Use longer intervals for better performance
const analysisId = await oloClient.vision.startVisionAnalysis(topic, 'yolo', {
intervalMs: 600, // Analyze every 600ms instead of default 500ms
showBoundingBoxes: true,
videoElement: videoElement
});Vision-Based Robot Control Example
// Stop robot when person is detected using camera topic directly
const { bestTopic } = await oloClient.video.detectVideoTopics();
await oloClient.vision.startVisionAnalysis(bestTopic, 'yolo', {
intervalMs: 500,
onResult: async (result) => {
const people = result.detections.filter(d => d.class === 'person');
if (people.length > 0) {
await oloClient.core.stopRobot();
console.log('Person detected - stopping robot for safety');
}
}
});