Camera Images

The Leap Motion controller uses infrared stereo cameras as tracking sensors. You can access the images from these cameras using the Controller.images or Frame.images functions. These functions provide an ImageList object, containing the Image objects. Controller.images provides the most recent set of images. Frame.images provides the set of images analysed to create that frame and can be slightly older than the images returned by the Controller directly.

An image from one of the cameras. A grid highlighting the significant, complex distortion is superimposed on the image.

The images can be used for:

  • Head-mounted display video pass-through
  • Augmented reality
  • Computer vision

The Image API provides a buffer containing the sensor brightness values and a buffer containing the camera calibration map, which can be used to correct lens distortion and other optical imperfections in the image data.

Image API Basics

Get ImageList objects from either Controller.images or Frame.images. The Controller.images function gives you the most recent images. Frame.images gives you the images associated with that frame. Since processing the frame takes a bit of time, the images from the frame will be at least one camera frame behind the images obtained from the controller. (In a future version, the data frame rate may be decoupled from the camera frame rate, so the difference could be larger.) Images from the controller have the smallest latency, but won’t match up as well to the tracking data of the current frame. When using Controller.images, you can implement the on_images() callback in a Listener object. Your Listener.on_images() callback is invoked by the Controller as soon as a new set of images is ready.

Image data is provided as an array of pixel values. The format of this data is reported by the Image.format value. Currently, one format is in use. This “INFRARED” format uses one byte per pixel, defining the brightness measured for that sensor location. You can display infrared-format data as a greyscale image. Future Leap Motion hardware may provide sensor image data in a different format.

Image Distortion

When a ray of light enters one of the Leap Motion cameras, the lens bends the light ray so that it hits the sensor, which records it as a greyscale brightness value at a specific pixel location. Of course, no lens is perfect, so a ray of light does not land on the sensor in the optically perfect spot. The calibration map provides data to correct this imperfection, allowing you to calculate the true angle of the original ray of light. You can use the corrected angle to generate a distortion-free image, and, using the angles from both images in the stereo pair, you can triangulate the 3D location of a feature identified in both images. Note that the calibration map corrects lens distortion; it does not correct perspective distortion.

For image correction, the distortion data can be fed to a shader program that can efficiently interpolate the correction applied to rays of light. For getting the true angle for a small set of points, you can use the Image.warp() function (but this is not efficient enough to transform a full bitmap at a high frame rate).

The distortion data is based on the angle of view of the Leap Motion cameras. The image class provides functions, Image.ray_scale_x and Image.ray_scale_y that are proportional to view angles large enough to ensure that distortion map covers the entire view, about 150 degrees for the current Leap Motion peripheral. A 150 degree angle of view means that a light ray passing through the lens has a maximum slope of 4/1.

A view angle of 150 degrees corresponds to a slope of ±4 (the tangent of 75 degrees is approximately 4)

The image above shows a reconstruction of the distortion-corrected image data. The brightness value of each pixel in the image originated from a ray of light entering the camera from a specific direction. The image is reconstructed by calculating the horizontal and vertical slopes represented by each pixel and finding the true brightness value from the image data using the calibration map. The red portions of the image represent areas within the rendering for which no brightness value is available (the actual field of view is less than 150 degrees).

Image Orientation

The top of the image is always toward the negative direction of the z-axis of the Leap Motion coordinate system. By default, the Leap Motion software automatically adjusts the coordinate system so that hands enter from the positive direction of the z-axis. (Users can disable auto-orientation using the Leap Motion control panel.) Before hands are inserted into the field of view, it isn’t possible to know which way the images are oriented, since the user can typically place or mount the device in either physical orientation (i.e. with the green LED on the long side of the device facing one way or the other). If the user places the device in the opposite way than you expect, the images will be upside down until they put their hands into view (or turn the device itself around).

Get the Raw Images

Before you can get image data, you must set the POLICY_IMAGES flag using the Controller.set_policy() function. For privacy reasons, each user must also enable the feature in the Leap Motion control panel for any application to get the raw camera images.

To get the image data, use either the Controller.images or the Frame.images function. Since the Leap Motion peripheral has two cameras, these functions return an ImageList object that contains two images (this could change in the future if multiple Leap Motion devices can be active at the same time). The image at index 0 is the left camera; the image at index 1 is the right camera. Note that the left-right orientation of the peripheral can be detected automatically based on the direction from which the user inserts his or her hand into the field of view. Detection is enabled by the auto-orientation setting in the Leap Motion control panel.

Once you have an Image object, you can get the 8-bit brightness values from the data buffer. The length of this buffer is Image.width times Image.height times Image.bytes_per_pixel. The width and height of the image changes with the current operating mode of the controller, which can change from frame to frame. Note that in “robust mode,” the images are half as tall.

Get the Calibration Map

The calibration map can be used to correct image distortion due to lens curvature and other imperfections. The map is a 64x64 grid of points. Each point consists of two 32-bit values, so the buffer size is 128 times 64 times 4. You can get the calibration map buffer using the Image.distortion function.

Each point in the buffer indicates where to find the corrected brightness value for the corresponding pixel in the raw image. Valid coordinates are normalized in the range [0..1]. Individual elements of the calibration map can have a value in the range [-0.6..2.3], but coordinates below zero or above 1 are invalid. Discard values outside the range [0..1] when using the calibration data.

To convert to pixel coordinates multiply by the width or height of the image. For pixels that lie in between the calibration grid points, you can interpolate between the nearest grid points. The camera lenses have a very large angle of view (roughly 150 degrees) and have a large amount of distortion. Because of this, not every point in the calibration grid maps to a valid pixel. The following rendering shows the lens correction data as color values. The left image shows the x values; the right side shows the y values.

The red values indicate map values that fall outside the image.

The size of the calibration map is subject to change in the future, so the Image class provides the grid dimensions with the distortion_width (actually twice the width to account for two values per grid point) and distortion_height functions. The length of the buffer containing the calibration data is distortion_width times distortion_height times 4 bytes.

Image Ray Correction

You can correct the raw image distortion in two ways:

The warp() and rectify() functions are the simpler method, but processing each pixel individually on the CPU is relatively slow. Use these functions if you are only correcting a few points, you don’t need to process data in real time, or when you cannot use GPU shaders. The distortion buffer is designed to be used with a GPU shader program and can correct the entire raw image while maintaining a good application frame rate.

Correction using Image.warp()

Image.warp() takes a ray direction and returns the pixel coordinates into the raw image data that specify the brightness value recorded for that ray direction.

Correction using Shaders

A more efficient way to correct the entire image is to use a GPU shader program. Pass the image data to a fragment shader as a normal texture and the distortion data as encoded textures. You can then texture a quad by decoding the distortion data and using that to look up the correct brightness value in the image texture.

TODO: example code

Encoding Distortion Data in an 32-bit ARGB Texture

If a 32-bit-per-component texture format is not available on your target platform, you can use a separate texture for the x and y lookup values and encode the floating point values into multiple 8-bit color components. You then have to decode the values before using them to look up the raw brightness values.

A common method for encoding floating point data in a texture is to decompose the input value into four lower-precision values and then restore them in the shader. For example, you can encode a floating point number into a Color object that has four 8-bit components as follows:

Color encodeFloatRGBA(float input)
    input = (input + 0.6)/2.3; //scale the input value to the range [0..1]
    float r = input;
    float g = input * 255;
    float b = input * 255 * 255;
    float a = input * 255 * 255 * 255;

    r = r - (float)Math.floor(r);
    g = g - (float)Math.floor(g);
    b = b - (float)Math.floor(b);
    a = a - (float)Math.floor(a);

    return Color(r, g, b, a);

To recompose the value in the fragment shader, you look up the value in the texture and perform the reciprocal operation. To avoid losing too much precision, encode the x and y distortion values in separate textures. Once the distortion indices are sampled from the textures and decoded, you can look up the correct brightness value from the camera image texture.

uniform sampler2D texture;
uniform sampler2D vDistortion;
uniform sampler2D hDistortion;

varying vec2 distortionLookup;
varying vec4 vertColor;
varying vec4 vertTexCoord;

const vec4 decoderCoefficients = vec4(1.0, 1.0/255.0, 1.0/(255.0*255.0), 1.0/(255.0*255.0*255.0));

void main() {
  vec4 vEncoded = texture2D(vDistortion,;
  vec4 hEncoded = texture2D(hDistortion,;
  float vIndex = dot(vEncoded, decoderCoefficients) * 2.3 - 0.6;
  float hIndex = dot(hEncoded, decoderCoefficients) * 2.3 - 0.6;

  if(vIndex >= 0.0 && vIndex <= 1.0
        && hIndex >= 0.0 && hIndex <= 1.0)
      gl_FragColor = texture2D(texture, vec2(hIndex, vIndex)) * vertColor;
  } else {
      gl_FragColor = vec4(1.0, 0, 0, 1.0); //show invalid pixels as red

Correction using bilinear interpolation

In situations where shaders are not feasible you may be able to correct image distortion faster using well-optimized bilinear interpolation than when using the warp() function. (As with any such optimization, you should verify your results with performance testing.)

Recall that the distortion map contains a 64x64 element grid. Imagine these grid elements evenly spread out over your target image (with element [0, 0] in the lower-lefthand corner and [64,64] in the upper-right). Each element contains a horizontal coordinate and a vertical coordinate identifying where in the sensor image data to find the recorded brightness for that pixel in the target image. To find the brightness values for pixels in between the distortion grid elements, you have to interpolate between the four nearest grid points.

The base algorithm for finding the distortion-corrected brightness for a given pixel in the target image is:

  1. Find the four points in the calibration grid surrounding the target pixel.
  2. Calculate the interpolation weights based on the distance of the target to each surrounding grid point.
  3. Lookup the horizontal and vertical values at each of the four grid elements.
  4. Bilinearly interpolate the horizontal value using the distance-based weighting factors.
  5. Repeat this interpolation for the vertical value.
  6. Reject any points where either the horizontal or vertical value is outside of the range [0..1]. There is no recorded data for such points.
  7. Denormalize the values so that they represent pixel coordinates into the raw sensor data.
  8. Look up the sensor value at the computed pixel coordinates.
  9. Set this brightness value at the original coordinates in the target image.

Performing bilinear interpolation by looping over every image pixel is still too slow in Python. Instead, you can use functions provided by the OpenCV library to perform the interpolation. First, convert the distortion data into a format that can be used by the cv2.remap() function:

import cv2, Leap, math, ctypes
import numpy as np

def convert_distortion_maps(image):

    distortion_length = image.distortion_width * image.distortion_height
    xmap = np.zeros(distortion_length/2, dtype=np.float32)
    ymap = np.zeros(distortion_length/2, dtype=np.float32)

    for i in range(0, distortion_length, 2):
        xmap[distortion_length/2 - i/2 - 1] = image.distortion[i] * image.width
        ymap[distortion_length/2 - i/2 - 1] = image.distortion[i + 1] * image.height

    xmap = np.reshape(xmap, (image.distortion_height, image.distortion_width/2))
    ymap = np.reshape(ymap, (image.distortion_height, image.distortion_width/2))

    #resize the distortion map to equal desired destination image size
    resized_xmap = cv2.resize(xmap,
                              (image.width, image.height),
                              0, 0,
    resized_ymap = cv2.resize(ymap,
                              (image.width, image.height),
                              0, 0,

    #Use faster fixed point maps
    coordinate_map, interpolation_coefficients = cv2.convertMaps(resized_xmap,
                                                                 nninterpolation = False)

    return coordinate_map, interpolation_coefficients

And then pass the maps and the corresponding image to the cv2.remap() function:

def undistort(image, coordinate_map, coefficient_map, width, height):
    destination = np.empty((width, height), dtype = np.ubyte)

    #wrap image data in numpy array
    i_address = int(image.data_pointer)
    ctype_array_def = ctypes.c_ubyte * image.height * image.width
    # as ctypes array
    as_ctype_array = ctype_array_def.from_address(i_address)
    # as numpy array
    as_numpy_array = np.ctypeslib.as_array(as_ctype_array)
    img = np.reshape(as_numpy_array, (image.height, image.width))

    #remap image to destination
    destination = cv2.remap(img,
                            interpolation = cv2.INTER_LINEAR)

    #resize output to desired destination size
    destination = cv2.resize(destination,
                             (width, height),
                             0, 0,
    return destination

Note that you should avoid converting the distortion maps every frame. They only change when a different device is plugged in, the image reverses orientation (when hands enter from the opposite side), or the device is recalibrated. The following code only converts the distortion maps once (but doesn’t handle the cases where the distortion maps can change):

def run(controller):
    maps_initialized = False
        frame = controller.frame()
        image = frame.images[0]
        if image.is_valid:
            if not maps_initialized:
                left_coordinates, left_coefficients = convert_distortion_maps(frame.images[0])
                right_coordinates, right_coefficients = convert_distortion_maps(frame.images[1])
                maps_initialized = True

            undistorted_left = undistort(image, left_coordinates, left_coefficients, 400, 400)
            undistorted_right = undistort(image, right_coordinates, right_coefficients, 400, 400)

            #display images
            cv2.imshow('Left Camera', undistorted_left)
            cv2.imshow('Right Camera', undistorted_right)

            if cv2.waitKey(1) & 0xFF == ord('q'):

def main():
    controller = Leap.Controller()
    except KeyboardInterrupt:
if __name__ == '__main__':

Draw Tracking Data over Image

It is reasonably straightforward to draw representations of the Leap Motion tracking data over the camera image. If you have drawn the raw image data to a bitmap, you can find the pixel corresponding to a Leap Motion position using the warp() function.

Converting a position in Leap Motion coordinates to horizontal and vertical slopes (from the camera perspective) requires knowing how far the cameras are from the origin of the Leap Motion coordinate system. For the current peripheral version, the offset on the x axis is 20mm to either side. The cameras are on the x-axis, so there is no z offset. The slope is simply the distance from the camera in the image plane – the x-coordinate for the horizontal slope; the z-coordinate for the vertical slope – divided by the distance to the image plane, the z-coordinate. The following diagram illustrates the geometry for the horizontal slope:

The calculation is shown for the left camera; add the offset distance instead of subtracting for the right camera.

Once you know the ray slope values, you can get the pixel coordinates using warp().

Note: The offset can be different for different form factors of the Leap Motion controller, but there is currently no way to get this value from the API.

If you have rendered the corrected image data, then correlating tracking data to the image data depends on how you rendered the image. For 3D scenes, this is a matter of using a consistent scale and correct placement of the textured quad showing the camera image. For other types of rendering, you must convert the ray slopes representing a Leap Motion position to a target image pixel according to the way that you corrected the image data.

Calculate the Direction to an Image Feature

Get the direction to an image feature with the Image.rectify() function. Image.rectify() returns a vector containing the horizontal and vertical slopes (as defined from the camera point of view) given the pixel coordinates in the raw image data.

If you can identify the same feature in both images with sufficient accuracy, you can triangulate the 3D position using the set of slope values from the two cameras.

Head-Mounted Display Mode

The Leap Motion service/daemon software provides a mode that optimizes tracking when the Leap Motion hardware is attached to a head-mounted display. In this mode, the Leap Motion software expects to view hands from the top rather than the bottom. When ambiguity exists whether the palm of a hand is facing toward or away from the Leap Motion sensors, setting this mode makes it more likely that the software will initialize the hand model so that it is facing away from the sensors. Thus this mode is good for mounting the Leap Motion device on the face of a head-mounted display rig.

To turn on the mode in your application, enable the optimize HMD policy:


The policy is always denied for hardware that cannot be mounted on an HMD, such as those embedded in laptops or keyboards.