書名： Mastering OpenCV 3（Second Edition）
作者名： Daniel Lélis Baggio Shervin Emami David Millán Escrivá Khvedchenia Ievgen Jason Saragih Roy Shilkrot
本章字數： 366字
更新時間： 2021-07-02 23:29:11

Finding camera matrices

Now that we have obtained matches between keypoints, we can calculate the essential matrix. However, we must first align our matching points into two arrays, where an index in one array corresponds to the same index in the other. This is required by the findEssentialMat function as we've seen in the Estimating Camera Motion section. We would also need to convert the KeyPoint structure to a Point2f structure. We must pay special attention to the queryIdx and trainIdx member variables of DMatch, the OpenCV struct that holds a match between two keypoints, as they must align with the way we used the DescriptorMatcher::match() function. The following code section shows how to align a matching into two corresponding sets of 2D points, and how these can be used to find the essential matrix:

    vector<KeyPoint> leftKpts, rightKpts; 
    // ... obtain keypoints using a feature extractor 

    vector<DMatch> matches; 
    // ... obtain matches using a descriptor matcher 

    //align left and right point sets 
    vector<Point2f>leftPts, rightPts; 
    for(size_ti = 0; i < matches.size(); i++){ 
      // queryIdx is the "left" image 
      leftPts.push_back(leftKpts[matches[i].queryIdx].pt); 

      // trainIdx is the "right" image 
      rightPts.push_back(rightKpts[matches[i].trainIdx].pt); 
    } 

    //robustly find the Essential Matrix 
    Mat status; 
    Mat E = findEssentialMat( 
      leftPts,      // points from left image 
      rightPts,     // points from right image 
      focal,        // camera focal length factor 
      pp,           // camera principal point 
      cv::RANSAC,   // use RANSAC for a robust solution 
      0.999,        // desired solution confidence level 
      1.0,          // point-to-epipolar-line threshold 
      status);        // binary vector for inliers

We may, later, use the status binary vector to prune those points that align with the recovered essential matrix. Look at the following image for an illustration of point matching after pruning. The red arrows mark feature matches that were removed in the process of finding the matrix, and the green arrows are feature matches that were retained:

Now we are ready to find the camera matrices. This process is described at length in a chapter of H&Z's book; however, the new OpenCV 3 API makes things very easy for us by introducing the recoverPose function. First, we will briefly examine the structure of the camera matrix we are going to use:

This is the model for our camera pose, which consists of two elements: rotation (denoted by R) and translation (denoted by t). The interesting thing is that it holds a very essential equation: x = PX, where x is a 2D point on the image and X is a 3D point in space. There is more to it, but this matrix gives us a very important relationship between the image points and the scene points. So, now that we have a motivation for finding the camera matrices, we will see how it can be done. The following code section shows how to decompose the essential matrix into the rotation and translation elements:

    Mat E; 
    // ... find the essential matrix 

    Mat R, t; //placeholders for rotation and translation 

    //Find Pright camera matrix from the essential matrix 
    //Cheirality check is performed internally. 
    recoverPose(E, leftPts, rightPts, R, t, focal, pp, mask);

Very simple. Without going too deeply into the mathematical interpretation, this conversion of the essential matrix to rotation and translation is possible because the essential matrix was originally composed by these two elements. Strictly for satisfying our curiosity, we can look at the following equation for the essential matrix, which appears in the literature: E=[t]_xR. We see it is composed of (some form of) a translation element t and a rotational element R.

Note that a cheirality check is internally performed in the recoverPose function. The cheirality check makes sure that all triangulated 3D points are in front of the reconstructed camera. H&Z show that camera matrix recovery from the essential matrix has in fact four possible solutions, but the only correct solution is the one that will produce triangulated points in front of the camera, hence the need for a cheirality check. We will learn about triangulation and 3D reconstruction in the next section.

Note what we just did only gives us one camera matrix, and for triangulation, we require two camera matrices. This operation assumes that one camera matrix is fixed and canonical (no rotation and no translation, placed at the world origin):

The other camera that we recovered from the essential matrix has moved and rotated in relation to the fixed one. This also means that any of the 3D points that we recover from these two camera matrices will have the first camera at the world origin point (0, 0, 0). The assumption of a canonical camera is just how cv::recoverPose works; however in other situations, the origin camera pose matrix may be different than the canonical and still be valid for 3D points' triangulation, as we will see later when we will not use cv::recoverPose to get a new camera pose matrix.

One more thing we can think of adding to our method is error checking. Many times, the calculation of an essential matrix from point matching is erroneous, and this affects the resulting camera matrices. Continuing to triangulate with faulty camera matrices is pointless. We can install a check to see if the rotation element is a valid rotation matrix. Keeping in mind that rotation matrices must have a determinant of 1 (or -1), we can simply do the following:

    bool CheckCoherentRotation(const cv::Mat_<double>& R) { 
      if(fabsf(determinant(R))-1.0 >EPS) { 
        cerr <<"rotation matrix is invalid" <<endl; 
        return false;  
      } 
      return true; 
    }

Think of EPS (from Epsilon) as a very small number that helps us cope with numerical calculation limits of our CPU. In reality, we may define the following in code:

    #define EPS 1E-07

We can now see how all these elements combine into a function that recovers the P matrices. First, we will introduce some convenience data structures and type shorthand:

    typedef std::vector<cv::KeyPoint> Keypoints; 
    typedef std::vector<cv::Point2f>  Points2f; 
    typedef std::vector<cv::Point3f>  Points3f; 
    typedef std::vector<cv::DMatch>   Matching; 

    struct Features { //2D features  
      Keypoints keyPoints; 
      Points2f  points; 
      cv::Mat   descriptors; 
    }; 

    struct Intrinsics { //camera intrinsic parameters 
      cv::Mat K; 
      cv::Mat Kinv; 
      cv::Mat distortion; 
    };

Now we can write the camera matrix finding function:

    void findCameraMatricesFromMatch( 
      const Intrinsics&   intrin, 
      const Matching&     matches, 
      const Features&     featuresLeft, 
      const Features&     featuresRight, 
      cv::Matx34f&        Pleft, 
      cv::Matx34f&        Pright) { 
      { 
        //Note: assuming fx = fy 
        const double focal = intrin.K.at<float>(0, 0);  
        const cv::Point2d pp(intrin.K.at<float>(0, 2),     
                             intrin.K.at<float>(1, 2)); 

        //align left and right point sets using the matching 
        Features left; 
        Features right; 
        GetAlignedPointsFromMatch( 
          featuresLeft,  
          featuresRight,  
          matches,  
          left,  
          right); 

        //find essential matrix 
        Mat E, mask; 
        E = findEssentialMat( 
          left.points,  
          right.points,  
          focal,  
          pp,  
          RANSAC,  
          0.999,  
          1.0,  
          mask); 

        Mat_<double> R, t; 

        //Find Pright camera matrix from the essential matrix 
        recoverPose(E, left.points, right.points, R, t, focal, pp, mask); 

        Pleft = Matx34f::eye(); 
        Pright = Matx34f(R(0,0), R(0,1), R(0,2), t(0), 
                         R(1,0), R(1,1), R(1,2), t(1), 
                         R(2,0), R(2,1), R(2,2), t(2)); 
      }

At this point, we have the two cameras that we need in order to reconstruct the scene. The canonical first camera in the Pleft variable, and the second camera we calculated form the essential matrix in the Pright variable.

官术网_书友最值得收藏!

Mastering OpenCV 3（Second Edition）

Finding camera matrices