Image By QueSera4710 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=31586266
A little intro
Let me tell you about my adventures with Computer Vision.
My first project with this company involved object tracking. In order to achieve this with a high degree of accuracy we decided to use the Kinect V2.0, mainly to take advantage of it’s depth sensor. The Kinect SDK has object detection tools right out of the box but are geared towards the human body / face and not really for other shapes. Despite this the raw depth data can still be accessed.
In order to find a solution I loaded up my favourite search engine and looked for a way to detect objects with the depth data I had at my disposal.
Computer Vision using OpenCV
I eventually came across this blog post http://blogs.claritycon.com/blog/2012/11/blob-tracking-kinect-opencv-wpf/
which is a tutorial on the functionality I required. The application created from the tutorial uses a Computer Vision library to process the depth data and detect objects.
What is Computer Vision?
Computer Vision is technology used so that computers can identify, analyse and understand images. It is used in sectors ranging from robotics to the medical field.
OpenCV is a cross platform open source C/C++ library that provides Computer Vision functionality. At the time of writing this blog it can run on Windows, Linux, Mac, Android and iOS. It can compare images and highlight the differences or find certain shapes(ranging from squares and circles to the human face) inside an image.
Emgu CV is a cross platform .NET wrapper for the OpenCV library. It allows OpenCV functions to be called from .NET languages including C# and can be compiled by Visual Studio, Xamarin Studio and Unity. It can be run on the same operating systems supported by OpenCV and also Windows Phone.
How can OpenCV utilise the depth data?
In the tutorial there is a clever function that converts the Kinect depth data(at certain depths) in a 2D bitmap image. In the image the data that represents an object is represented by white pixels and the data that represent void is represented by black pixels.
The images(frames) rendered by this function are fed into the OpenCV library which in turn compares and then detects the shapes formed. Open CV also gives you tools in order to filter through the objects, such minimum blob(object) size, maximum (blob) size and threshold. Please note that the depth data contains noise which can be mostly filtered out using OpenCV’s functions.
Adapting the code to Kinect V2.0
Since the tutorial was published quite a while ago, it makes use of the Kinect V1.0 and it’s SDK. The Kinect V1.0 and V2.0 SDKs are significantly different so the code had to be modified in order to utilise it for my project.
Keeping with the Github spirit, I forked the code that was hosted on Github and applied my changes. You can see it here: https://github.com/drahcirsama/OpenCV-WPF-KinectV2 .