3D Reconstruction

Here you will find my solution to the "3D reconstruction" exercise. The goal of this one is to reconstruct a 3D scene with only 2 cameras.

Project details

We are going to solve this exercise using the Unibotics website (unibotics.org). This gives us access to a code editor and a simulator where we can see the result of it. You can find the exercise (link below).

The goal of this exercise is to reconstruct a 3d scene (represented by objects such as a Mario, Bowser, a duck, cubes ...) with only two cameras observing the scene at two different positions.

There is two important aspects of robotics which are: the robustness of the program and its execution time (should be real time).

Link here

The scene

This image is the 3d scene that we will try to reconstruct.

My solution

To reconstruct this 3d scene we will use epipolar lines and strereo reconstruction.
The resolution is divided in 5 parts which are :

1) Detection of the feature points from the left camera (with Canny edge detector).

(Optional) Reduce the number of the feature points find by the canny edge detector.

2) Calculate the back projection ray of every feature point.

3) Calculate the epipolar line for every feature point.

4) Find the corresponding pixels of the left camera in the right camera for every point (thanks to the epipolar line)

5) Calculate the 3D position of every point using triangulation.

O. The data provided

In order to solve this problem we have access to two cameras that observe the scene from two different locations.

Here are the images from the two cameras :

Left and right image of the 3d scene.

1. Features points

To start the reconstruction we have to choose which point will be reconstructed. For this we will choose feature points : points that are part of edges. We will detect these points with the Canny edge detector.

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986.

The result of the Canny algorithm on the provenance image of the left camera:

Result of the Canny algorithm on the left image.

(Optional) Reducing feature points

If too many feature points are detected by the Canny algorithm, it is possible to reduce them. Indeed, with the reduceNumberOfPoints() function it is possible to reduce the number of feature points.

That will reduce the amount of data to be processed, the result will be less qualitative, but it will take less time to execute (proportional to the amount of points removed).

2. Back projection ray

The backprojection ray is a line that will represent all the positions of the point in the scene (3D). This line passes through the optical centre (left camera position) and through the coordinates of the point on the image. The aim is to calculate all the back projection rays for all the points in order to find the 3D position of each of them.

The back projection line is represented by the "O, P(X,Y,Z)" line.

3. Epipolar line

The epipolar line is the one that will allow us to find the corresponding point in the image from left to right. Since our two cameras have the same vertical coordinates, we only need to draw a horizontal line in the right image with the same y coordinates as the left image point.

Epipolar line explained

4. Find corresponding point

To find the corresponding points we will use the cv2.matchTemplate() function of openCV by searching in the area drawn by the epipolar line. The function slides through image, compares the overlapped patches of size w×h against the template. I used the cv2.TM_CCOEFF_NORMED method (shifted mean cross-correlation).

Shifted mean cross-correlation, used in the matchTemplate() function.

Exemple of the matchTemplate() function.

5. Triangulation

Once we have obtained the "backprojection ray" of the point in the right image and its corresponding point (in the left image), we need only look for where these two lines intersect. This intersection (or where the spacing between the two lines is less) will be the 3D position of our point in the scene.

X point will be the 3D point in the scene

6. Results

I have achieved different results, one with 10,000 feature points and one with 18,000 feature points. We can see in the results that the more texture an object has the better it will be represented in 3D (for example the duck doesn't have much texture and it doesn't show much in the reconstruction). This is due to the result of the canny edge detector which takes into account the edges and therefore the texture.

Result with 10.000 features points.

Result with 18.300 features points.

Github