Zenital 3D traking with Kinect and OpenCv
We finally got working the code for the “detection room” installation. There were a few problems to sort out before achieving an acceptable accuracy.
Grayscale data vs raw data
The first problem we encounter was that the point cloud showed a heavy stepping. The fact that we were using the 8bit grayscale depth buffer made the stepping even worse, due to the downscale conversion from the 11bit raw depth to the 8bit grayscale image. The 11bit image has a range from 0 to 2048, but after the disparity calculation the usable values go approximately from 500 to 1000, which converted to meters correspond to the range limits of the Kinect sensor. So there is a lost of accuracy when converting from the 500 (1000-500) valid values to the 255 possible values of a 8bit grayscale image.
The stepping was slightly better as we switched to a float image, asigning to each pixel the real world depth in meters.
Angled position of the sensor
Another problem to solve was that we couln’t attach the sensor to the ceiling (which is about 5 meters high), and even if we could, the noise of the sensor at that distance makes the data useless. Therefore, we decided to attach the sensor to a wall, at 3m from the floor, angled to get advantage of the vertical field of view.
This setup forced us to make a 3D rotation and translation of the point cloud, to be able to detect the highest points of each blob related to the floor and not the nearest to the sensor.
The steps to do the detection of the highest point related to the floor are these:
Occlusion
The detection of the highest point of each blob was successful. But we enconuter the problem that if two people were to close together, so that from the point of view of the camera there was one single blob, there would be just one highest point detected for the blob of the two people, instead of having each one his own highest point.
To solve this issue, we generated a complete new zenital image from the projection of the 3D rotated point cloud. So the detection was done afterwards on the image as seen from above, where two people next to each other appear as two different blobs.
Now with the goal of the detection achieved, we calculated the distances from each blob’s highpoint to the PTZ camera unit, aiming the camera towards the nearest one.
Here is the system in action:







