書名： Mastering OpenCV 3（Second Edition）
作者名： Daniel Lélis Baggio Shervin Emami David Millán Escrivá Khvedchenia Ievgen Jason Saragih Roy Shilkrot
本章字?jǐn)?shù)： 863字
更新時(shí)間： 2021-07-02 23:29:12

Reconstructing the scene

Next, we look into the matter of recovering the 3D structure of the scene from the information we have acquired so far. As we had done before, we should look at the tools and information we have at hand to achieve this. In the preceding section, we obtained two camera matrices from the essential matrix; we already discussed how these tools would be useful for obtaining the 3D position of a point in space. Then, we can go back to our matched point pairs to fill in our equations with numerical data. The point pairs will also be useful in calculating the error we get from all our approximate calculations.

This is the time to see how we can perform triangulation using OpenCV. Luckily, OpenCV supplies us with a number of functions that make this process easy to implement: triangulatePoints, undistortPoints, and convertPointsFromHomogeneous.

Remember we had two key equations arising from the 2D point matching and P matrices: x=PX and x'= P'X, where x and x' are matching 2D points and X is a real-world 3D point imaged by the two cameras. If we examine these equations, we will see that the x vector that represents a 2D point should be of size (3x1) and X that represents a 3D point should be (4x1). Both points received an extra entry in the vector; this is called Homogeneous Coordinates. We use these coordinates to streamline the triangulation process.

The equation x = PX (where x is a 2D image point, X is a world 3D point, and P is a camera matrix) is missing a crucial element: the camera calibration parameters matrix, K. The matrix K is used to transform 2D image points from pixel coordinates to normalized coordinates (in the [-1, 1] range) removing the dependency on the size of the image in pixels, which is absolutely necessary. For example, a 2D point x₁ = (160, 120) in a 320x240 image, may transform to x₁' = (0, 0) under certain circumstances. To that end, we use the undistortPoints function:

    Vector<Point2f> points2d; //in 2D coordinates (x, y) 
    Mat normalizedPts;        //in homogeneous coordinates (x', y', 1) 

    undistortPoints(points2d, normalizedPts, K, Mat());

We are now ready to triangulate the normalized 2D image points into 3D world points:

    Matx34f Pleft, Pright; 
    //... findCameraMatricesFromMatch 

    Mat normLPts; 
    Mat normRPts; 
    //... undistortPoints 

    //the result is a set of 3D points in homogeneous coordinates (4D) 
    Mat pts3dHomog; 
    triangulatePoints(Pleft, Pright, normLPts, normRPts, pts3dHomog); 

    //convert from homogeneous to 3D world coordinates 
    Mat points3d; 
    convertPointsFromHomogeneous(pts3dHomog.t(), points3d);

In the following image, we can see a triangulation result of two images out of the Fountain P-11 sequence at http://cvlabwww.epfl.ch/data/multiview/denseMVS.html. The two images at the top are the original two views of the scene, and the bottom pair is the view of the reconstructed point cloud from the two views, including the estimated cameras looking at the fountain. We can see how the right-hand side section of the red brick wall was reconstructed, and also the fountain that protrudes from the wall:

However, as we discussed earlier, we have an issue with the reconstruction being only up to scale. We should take a moment to understand what up to scale means. The motion we obtained between our two cameras is going to have an arbitrary unit of measurement that is, it is not in centimeters or inches, but simply a given unit of scale. Our reconstructed cameras we will be one unit of scale distance apart. This has big implications, should we decide to recover more cameras later, as each pair of cameras will have their own units of scale, rather than a common one.

We will now discuss how the error measure that we set up may help us in finding a more robust reconstruction. First, we should note that reprojection means we simply take the triangulated 3D point and reimage it on a camera to get a reprojected 2D point, we then compare the distance between the original 2D point and the reprojected 2D point. If this distance is large, this means we may have an error in triangulation, so we may not want to include this point in the final result. Our global measure is the average reprojection distance and may give us a hint to how our triangulation performed overall. High average reprojection rates may point to a problem with the P matrices, and therefore a possible problem with the calculation of the essential matrix or the matched feature points. To reproject points, OpenCV offers the projectPoints function:

    Mat x34f P; //camera pose matrix 
    Mat points3d;     //triangulated points 
    Points2d imgPts; //2D image points that correspond to 3D points 
    Mat K;             //camera intrinsics matrix 

    // ... triangulate points 

    //get rotation and translation elements 
    Mat R; 
    Rodrigues(P.get_minor<3, 3>(0, 0), rvec); 
    Mat t = P.get_minor<3, 1>(0, 3); 

    //reproject 3D points back into image coordinates 
    Mat projPts; 
    projectPoints(points3d, R, t, K, Mat(),projPts); 

    //check inpidual reprojection error 
    for (size_t i = 0; i < points3d.rows; i++) { 
      const double err = norm(projPts.at<Point2f>(i) - imgPts[i]); 

      //check if point reprojection error is too big 
      if (err > MIN_REPROJECTION_ERROR){ 
        // Point reprojection error is too big. 
      } 
    }

Next, we will take a look at recovering more cameras looking at the same scene, and combining the 3D reconstruction results.

官术网_书友最值得收藏!

Mastering OpenCV 3（Second Edition）

Reconstructing the scene