Microsoft applys 3D motion sensing technology into XBox - Kinect, letting users interact with gaming freely without controller in hand. Camera and gesture recognition make the function of motion sensing. PrimeSense delivers the chip and structure to the 3D sensing technology, helping Microsoft to achieve controller-free entertainment device.
PrimeSense, a Israel company, comes up with a camera-based solution to 3D sensing. The module consist of :
- Color image sensor
- IR image sensor for depth measurement
- IR light source - laser diode and diffuser
- color image size: UXGA 1600x1200
- depth image size: VGA 640x480
- image throughput: 60 fps
- operation range: 0.8m~3.5m
- spatial x/y resolution: 3mm @2m distance
- depth z resolution: 1cm @2m distance
There are two speckle pattern could appear on the camera: one being primary speckle coming from the diffuser(projected on the object) and the other being secondary speckle formed during the imaging due to the lens aperture and object material roughness. It only concentrates in primary speckle. The primary speckle pattern, produced by the diffuser and DOE and then projected on the object, varies with z-axis. From the graph below, the object (human hand) shows different speckle pattern at different distance.
The speckle pattern is what PrimeSense names "structured light". Why the structured light has such property? ....See DOE below.
3D correlation will construct the 3D mapping of object. Because of the distance-dependent property, each position has its specific shape and spacing. Processor estimate the depth by correlating each window with the reference data.
For relative motion in z-axis, the speckle pattern will move along the x-axis on image plane of camera, because camera see the object at different angle. Based on the triangulation principle, correlation with x-direction can calculate the z-shift as the absolute distance is known from 3D mapping and the distance between projector and camera is defined. Each successive image is served as a reference image for the next frame calculation.
For example, object locates at 2m, spacing between projector and camera is 20cm, and given pixel size of sensor is 6um --> z-resolution is 15mm, calculated on triangulation relationship.
Calibration is carried out the the time of manufacture. A set of reference images were taken at different locations then stored in the memory.
Diffractive Optical Element (DOE)
In the illumination parts:
- laser diode
- diffuser - produce pseudo random pattern
Based on the extended depth of field (EDof), DOE is the embodiment of astigmatic optical element, which has different focus for different angle direction. From the picture above, the speckle of pseudo random distribution of diffuser would be focused to spot of different orientation at different focal distance.
PrimeSense's 3D sensing technology is very unique and special. There is no one has the similar patent in depth measurement. PrimeSense claims that the profile like human face can be distinct at 1m distance but loses the detail as it moves far away.
How kinect works with PrimeSense-2
Microsoft: 3D Gesture Recognition