I am trying to get the hang of camera calibration. For this purpose, I have mainly watched this lecture on youtube: Lecture 12: Camera Model by UCF CRCV. I believe that I understood the main ideas:
We basically deal with the question to relate 3D world points to 2d image points. In order to do this
- We take the 3D point W in homogenous coordinates (in a defined world coordinate system)
- Apply transformations (rotation, translation) to bring the camera and world coordinate system in alignment
- Use perspective projection (from the pinhole camera model) to calculate the image point I.
If we do not have the perspective projection (that is the intrinsic parameters) and we do not have the camera position and orientation in the world (extrinsic parameters), we can calculate them (=calibration) by using an object with easily findable points in the world, identifying those points in the taken image and building a system of equations which we can then solve using least squares.
I now noticed, that many calibration tools actually require multiple images to calibrate reliably. Concerning this, I have a couple of questions:
- How do we relate the multiple images we might have taken from the same object that we have moved? I have seen that the world origin is often placed in a chessboards corner, but this origin will change if you move the chessboard. Don't the origins have to match, such that we can calculate the extrinsic parameters over multiple images?
- Why exactly is it better to use multiple images? Couldn't you just use a big chessboard that gives you a sufficient number of points to solve the system of equations?