Estimating the camera motion from a pair of images
Before we set out to actually find the motion between two cameras, let's examine the inputs and the tools we have at hand to perform this operation. First, we have two images of the same scene from (hopefully not extremely) different positions in space. This is a powerful asset, and we will make sure that we use it. As for tools, we should take a look at mathematical objects that impose constraints over our images, cameras, and the scene.
Two very useful mathematical objects are the fundamental matrix (denoted by F) and the essential matrix (denoted by E), which impose a constraint over corresponding 2D points in two images of the scene. They are mostly similar, except that the essential matrix is assuming usage of calibrated cameras; this is the case for us, so we will choose it. OpenCV allows us to find the fundamental matrix via the findFundamentalMat function and the essential matrix via the findEssentialMatrix function. Finding the essential matrix can be done as follows:
Mat E = findEssentialMat(leftPoints, rightPoints, focal, pp);
This function makes use of matching points in the left-hand side image, leftPoints, and right-hand side image, rightPoints, which we will discuss shortly, as well as two additional pieces of information from the camera's calibration: the focal length, focal, and principal point, pp.
The essential matrix E is a 3x3 sized matrix, which imposes the following constraint on a point x in one image and a point and a point x' corresponding image:
x'KTEKx = 0
Here, K is the calibration matrix.
This is extremely useful, as we are about to see. Another important fact we use is that the essential matrix is all we need in order to recover the two cameras' positions from our images, although only up to an arbitrary unit of scale. So, if we obtain the essential matrix, we know where each camera is positioned in space, and where it is looking. We can easily calculate the matrix if we have enough of those constraint equations, simply because each equation can be used to solve for a small part of the matrix. In fact, OpenCV internally calculates it using just five point-pairs, but through the Random Sample Consensus algorithm (RANSAC), many more pairs can be used and they make for a more robust solution.