書名： Mastering OpenCV 3（Second Edition）
作者名： Daniel Lélis Baggio Shervin Emami David Millán Escrivá Khvedchenia Ievgen Jason Saragih Roy Shilkrot
本章字數： 547字
更新時間： 2021-07-02 23:29:10

Exploring Structure from Motion Using OpenCV

In this chapter, we will discuss the notion of Structure from Motion (SfM),or better put, extracting geometric structures from images taken with a camera under motion, using OpenCV's API to help us. First, let's constrain the otherwise very b road approach to SfM using a single camera, usually called a monocular approach, and a discrete and sparse set of frames rather than a continuous video stream. These two constrains will greatly simplify the system we will sketch out in the coming pages, and help us understand the fundamentals of any SfM method. To implement our method, we will follow in the footsteps of Hartley and Zisserman (hereafter referred to as H&Z, for brevity), as documented in Chapters 9 through 12 of their seminal book Multiple View Geometry in Computer Vision.

In this chapter, we will cover the following:

Structure from Motion concepts
Estimating the camera motion from a pair of images
Reconstructing the scene
Reconstructing from many views
Refining the reconstruction

Throughout the chapter, we assume the use of a calibrated camera, one that was calibrated beforehand. Calibration is a ubiquitous operation in Computer Vision, fully supported in OpenCV using command-line tools, and was discussed in previous chapters. We, therefore, assume the existence of the camera's intrinsic parameters embodied in the K matrix and distortionn coefficients vector - the outputs from the calibration process.

To make things clear in terms of language, from this point on, we will refer to a camera as a single view of the scene rather than to the optics and hardware taking the image. A camera has a 3D position in space (translation) and a 3D direction of view (orientation). In general, we describe this as the 6 Degree of Freedom (DOF) camera pose, sometimes referred to as extrinsic parameters. Between two cameras, therefore, there is a 3D translation element (movement through space) and a 3D rotation of the direction of view.

We will also unify the terms for the point in the scene, world, real, or 3D to be the same thing, a point that exists in our real world. The same goes for points in an image or 2D, which are points in the image coordinates of some real 3D point that was projected on the camera sensor at that location and time.

In the chapter's code sections, you will notice references to Multiple View Geometry in Computer Vision, for example // HZ 9.12. This refers to equation number 12 of Chapter 9 of the book. Also, the text will include excerpts of code only; while the complete runnable code is included in the material accompanied with the book.

The following flow diagram describes the process in the SfM pipeline we will implement. We begin by triangulating an initial reconstructed point cloud of the scene, using 2D features matched across the image set and a calculation of two camera poses. We then add more views to the reconstruction by matching more points into the forming point cloud, calculating camera poses and triangulating their matching points. In between, we will also perform bundle adjustment to minimize the error in the reconstruction. All the steps are detailed in the next sections of this chapter, with relevant code excerpts, pointers to useful OpenCV functions, and mathematical reasoning:

官术网_书友最值得收藏!

Mastering OpenCV 3（Second Edition）

Exploring Structure from Motion Using OpenCV