Exploring Structure from Motion Using OpenCV
In this chapter, we will discuss the notion of Structure from Motion (SfM),or better put, extracting geometric structures from images taken with a camera under motion, using OpenCV's API to help us. First, let's constrain the otherwise very b road approach to SfM using a single camera, usually called a monocular approach, and a discrete and sparse set of frames rather than a continuous video stream. These two constrains will greatly simplify the system we will sketch out in the coming pages, and help us understand the fundamentals of any SfM method. To implement our method, we will follow in the footsteps of Hartley and Zisserman (hereafter referred to as H&Z, for brevity), as documented in Chapters 9 through 12 of their seminal book Multiple View Geometry in Computer Vision.
In this chapter, we will cover the following:
- Structure from Motion concepts
- Estimating the camera motion from a pair of images
- Reconstructing the scene
- Reconstructing from many views
- Refining the reconstruction
Throughout the chapter, we assume the use of a calibrated camera, one that was calibrated beforehand. Calibration is a ubiquitous operation in Computer Vision, fully supported in OpenCV using command-line tools, and was discussed in previous chapters. We, therefore, assume the existence of the camera's intrinsic parameters embodied in the K matrix and distortionn coefficients vector - the outputs from the calibration process.
To make things clear in terms of language, from this point on, we will refer to a camera as a single view of the scene rather than to the optics and hardware taking the image. A camera has a 3D position in space (translation) and a 3D direction of view (orientation). In general, we describe this as the 6 Degree of Freedom (DOF) camera pose, sometimes referred to as extrinsic parameters. Between two cameras, therefore, there is a 3D translation element (movement through space) and a 3D rotation of the direction of view.
We will also unify the terms for the point in the scene, world, real, or 3D to be the same thing, a point that exists in our real world. The same goes for points in an image or 2D, which are points in the image coordinates of some real 3D point that was projected on the camera sensor at that location and time.
In the chapter's code sections, you will notice references to Multiple View Geometry in Computer Vision, for example // HZ 9.12. This refers to equation number 12 of Chapter 9 of the book. Also, the text will include excerpts of code only; while the complete runnable code is included in the material accompanied with the book.
The following flow diagram describes the process in the SfM pipeline we will implement. We begin by triangulating an initial reconstructed point cloud of the scene, using 2D features matched across the image set and a calculation of two camera poses. We then add more views to the reconstruction by matching more points into the forming point cloud, calculating camera poses and triangulating their matching points. In between, we will also perform bundle adjustment to minimize the error in the reconstruction. All the steps are detailed in the next sections of this chapter, with relevant code excerpts, pointers to useful OpenCV functions, and mathematical reasoning:

- jQuery Mobile Web Development Essentials(Third Edition)
- Mastering Zabbix(Second Edition)
- Delphi程序設(shè)計(jì)基礎(chǔ):教程、實(shí)驗(yàn)、習(xí)題
- Python程序設(shè)計(jì)(第3版)
- 深入理解Java7:核心技術(shù)與最佳實(shí)踐
- Quarkus實(shí)踐指南:構(gòu)建新一代的Kubernetes原生Java微服務(wù)
- EPLAN實(shí)戰(zhàn)設(shè)計(jì)
- Visual Basic程序設(shè)計(jì)
- 學(xué)習(xí)OpenCV 4:基于Python的算法實(shí)戰(zhàn)
- Mastering Web Application Development with AngularJS
- 深入理解C指針
- Practical GIS
- Appcelerator Titanium:Patterns and Best Practices
- Practical Predictive Analytics
- C#程序設(shè)計(jì)基礎(chǔ)入門教程