書名： Mastering OpenCV 3（Second Edition）
作者名： Daniel Lélis Baggio Shervin Emami David Millán Escrivá Khvedchenia Ievgen Jason Saragih Roy Shilkrot
本章字數： 1004字
更新時間： 2021-07-02 23:29:12

Reconstruction from many views

Now that we know how to recover the motion and scene geometry from two cameras, it would seem simple to get the parameters of additional cameras and more scene points simply by applying the same process. This matter is in fact not so simple, as we can only get a reconstruction that is upto scale, and each pair of pictures has a different scale.

There are a number of ways to correctly reconstruct the 3D scene data from multiple views. One way to achieve camera pose estimation or camera resectioning, is the Perspective N-Point(PnP) algorithm, where we try to solve for the position of a new camera using N 3D scene points, which we have already found and their respective 2D image points. Another way is to triangulate more points and see how they fit into our existing scene geometry; this will tell us the position of the new camera by means of point cloud registration. In this section, we will discuss using OpenCV's solvePnP functions that implements the first method.

The first step we choose in this kind of reconstruction, incremental 3D reconstruction with camera resection, is to get a baseline scene structure. As we will look for the position of any new camera based on a known structure of the scene, we need to find an initial structure to work with. We can use the method we previously discussed-for example, between the first and second frames, to get a baseline by finding the camera matrices (using the findCameraMatricesFromMatch function) and triangulate the geometry (using triangulatePoints).

Having found an initial structure, we may continue; however, our method requires quite a bit of bookkeeping. First we should note that the solvePnP function needs aligned vectors of 3D and 2D points. Aligned vectors mean that the ith position in one vector aligns with the i^th position in the other. To obtain these vectors we need to find those points among the 3D points that we recovered earlier, which align with the 2D points in our new frame. A simple way to do this is to attach, for each 3D point in the cloud, a vector denoting the 2D points it came from. We can then use feature matching to get a matching pair.

Let's introduce a new structure for a 3D point as follows:

    struct Point3DInMap { 
      // 3D point. 
      cv::Point3f p; 

      // Mapping from image index to a 2D point in that image's  
      // list of features that correspond to this 3D point. 
      std::map<int, int> originatingViews; 
    };

It holds, on top of the 3D point, an index to the 2D point inside the vector of 2D points that each frame has, which had contributed to this 3D point. The information for Point3DInMap::originatingViews must be initialized when triangulating a new 3D point, recording which cameras were involved in the triangulation. We can then use it to trace back from our 3D point cloud to the 2D point in each frame.

Let's add some convenience definitions:

    struct Image2D3DMatch { //Aligned vectors of 2D and 3D points 
      Points2f points2D; 
      Points3f points3D; 
    }; 

    //A mapping between an image and its set of 2D-3D aligned points 
    typedef std::map<int, Image2D3DMatch> Images2D3DMatches;

Now, let's see how to get aligned 2D-3D point vectors to use with solvePnP. The following code segment illustrates the process of finding 2D points in a new image from the existing 3D point cloud augmented with the originating 2D views. Simply put, the algorithm scans the existing 3D points in the cloud, looks at their originating 2D points, and tries to find a match (via the feature descriptors) to 2D points in the new image. If such a match is found, it may indicate that this 3D point also appears in the new image at a specific 2D point:

    Images2D3DMatches matches; 

    //scan all pending new views 
    for (size_tnewView = 0; newView<images.size(); newView++) { 
      if (doneViews.find(newView) != doneViews.end()) { 
        continue; //skip done views 
      } 

    Image2D3DMatch match2D3D; 

    //scan all current cloud's 3D points 
    for (const Point3DInMap&p : currentCloud) { 

      //scan all originating views for that 3D cloud point 
      for (const auto& origViewAndPoint : p.originatingViews) { 

        //check for 2D-2D matching via the match matrix 
        int origViewIndex        = origViewAndPoint.first; 
        int origViewFeatureIndex = origViewAndPoint.second; 

        //match matrix is upper-triangular (not symmetric)  
        //so the left index must be the smaller one 
        bool isLeft = (origViewIndex <newView); 
        int leftVIdx = (isLeft) ? origViewIndex: newView; 
        int rightVIdx = (isLeft) ? newView : origViewIndex; 

        //scan all 2D-2D matches between originating and new views 
        for (const DMatch& m : matchMatrix[leftVIdx][rightVIdx]) { 
           int matched2DPointInNewView = -1; 

            //find a match for this new view with originating view 
            if (isLeft) { 
              //originating view is 'left' 
              if (m.queryIdx == origViewFeatureIndex) { 
                matched2DPointInNewView = m.trainIdx; 
              } 
            } else {
              //originating view is 'right' 
              if (m.trainIdx == origViewFeatureIndex) { 
                matched2DPointInNewView = m.queryIdx; 
              } 
            } 

            if (matched2DPointInNewView >= 0) { 
              //This point is matched in the new view 
              const Features& newFeat = imageFeatures[newView]; 

              //Add the 2D point form the new view 
              match2D3D.points2D.push_back( 
                newFeat.points[matched2DPointInNewView] 
              ); 

              //Add the 3D point 
              match2D3D.points3D.push_back(cloudPoint.p); 

              break; //look no further 
            } 
          } 
        } 
      } 
      matches[viewIdx] = match2D3D;  
    }

Now we have aligned the pairing of 3D points in the scene to the 2D points in a new frame, and we can use them to recover the camera position as follows:

    Image2D3DMatch match; 
    //... find 2D-3D match 

    //Recover camera pose using 2D-3D correspondence 
    Mat rvec, tvec; 
    Mat inliers; 
    solvePnPRansac( 
      match.points3D,    //3D points 
      match.points2D,    //2D points 
      K,                   //Calibration intrinsics matrix 
      distortion,        //Calibration distortion coefficients 
      rvec,//Output extrinsics: Rotation vector 
      tvec,                //Output extrinsics: Translation vector 
      false,               //Don't use initial guess 
      100,                 //Iterations 
      RANSAC_THRESHOLD, //Reprojection error threshold 
      0.99,                //Confidence 
      inliers              //Output: inliers indicator vector 
    ); 

    //check if inliers-to-points ratio is too small 
    const float numInliers   = (float)countNonZero(inliers); 
    const float numPoints    = (float)match.points2D.size(); 
    const float inlierRatio = numInliers / numPoints; 

    if (inlierRatio < POSE_INLIERS_MINIMAL_RATIO) { 
      cerr << "Inliers ratio is too small: "  
           << numInliers<< " / " <<numPoints<< endl; 
      //perhaps a 'return;' statement 
    } 

    Mat_<double>R; 
    Rodrigues(rvec, R); //convert to a 3x3 rotation matrix 

    P(0, 0) = R(0, 0); P(0, 1) = R(0, 1); P(0, 2) = R(0, 2); 
    P(1, 0) = R(1, 0); P(1, 1) = R(1, 1); P(1, 2) = R(1, 2); 
    P(2, 0) = R(2, 0); P(2, 1) = R(2, 1); P(2, 2) = R(2, 2); 
    P(0, 3) = tvec.at<double>(0, 3); 
    P(1, 3) = tvec.at<double>(1, 3); 
    P(2, 3) = tvec.at<double>(2, 3);

Note that we are using the solvePnPRansac function rather than the solvePnP function as it is more robust to outliers. Now that we have a new P matrix, we can simply use the triangulatePoints function as we did earlier and populate our point cloud with more 3D points.

In the following image, we see an incremental reconstruction of the Fountain-P11 scene at http://cvlabwww.epfl.ch/data/multiview/denseMVS.html, starting from the fourth image. The top-left image is the reconstruction after four images were used; the participating cameras are shown as red pyramids with a white line showing the direction. The other images show how more cameras add more points to the cloud:

官术网_书友最值得收藏!

Mastering OpenCV 3（Second Edition）

Reconstruction from many views