3
$\begingroup$

I am trying to get the hang of camera calibration. For this purpose, I have mainly watched this lecture on youtube: Lecture 12: Camera Model by UCF CRCV. I believe that I understood the main ideas:

We basically deal with the question to relate 3D world points to 2d image points. In order to do this

  1. We take the 3D point W in homogenous coordinates (in a defined world coordinate system)
  2. Apply transformations (rotation, translation) to bring the camera and world coordinate system in alignment
  3. Use perspective projection (from the pinhole camera model) to calculate the image point I.

If we do not have the perspective projection (that is the intrinsic parameters) and we do not have the camera position and orientation in the world (extrinsic parameters), we can calculate them (=calibration) by using an object with easily findable points in the world, identifying those points in the taken image and building a system of equations which we can then solve using least squares.

I now noticed, that many calibration tools actually require multiple images to calibrate reliably. Concerning this, I have a couple of questions:

  1. How do we relate the multiple images we might have taken from the same object that we have moved? I have seen that the world origin is often placed in a chessboards corner, but this origin will change if you move the chessboard. Don't the origins have to match, such that we can calculate the extrinsic parameters over multiple images?
  2. Why exactly is it better to use multiple images? Couldn't you just use a big chessboard that gives you a sufficient number of points to solve the system of equations?
$\endgroup$

2 Answers 2

3
$\begingroup$

You are correct: to calibrate a camera you need a correspondence between 3D world points and 2D image points. The problem is that the 3D points cannot be co-planar, so people were building 3D calibration rigs, e. g. a box made of checkerboards. One image of a rig like that would be enough to calibrate, but those rigs are hard to build, because you have to get the planes to be at exactly right angles to each other.

Then Zhengyou Zhang came up with the algorithm to calibrate from multiple images of a planar pattern, like a checkeboard. You still need non-coplanar points, but you get them from multiple images of a planar pattern in different 3D orientations, rather than a single image of a 3D rig.

Noise and errors are another reason why you need multiple images. You cannot detect the checkerboard corners with infinite precision. So you need a lot of points, to estimate the camera parameters more reliably.

$\endgroup$
1
  • $\begingroup$ Hi Dima! Thanks for your answer. I still have not understood how to relate the coordinate systems over the multiple images. Is that something which has only been invented in the paper you mentioned? Before that, people really used one image? Also, why can't the points be co-planar? $\endgroup$ Commented Jul 2, 2015 at 6:20
1
$\begingroup$

So here are the answers to the questions:

  1. It's a good question. First, your calibration grid should somehow be "coded". For example, the OpenCV checkerboard pattern is a rectangle and the points are sorted from upper left to lower right. This way, you find the exact correspondences between your 3D model and 2D points.

For multiple views, the origin doesn't matter. Think of it as taking multiple shots of a static calibration pattern and looking for best intrinsic matrix to project all. The poses (homographies) can vary, but the intrinsics stay the same. You would optimize for the intrinsics and later on for extrinsics. Use the relation: $$H=A \left[ \begin{matrix} r_1 & r_2 & t \end{matrix}\right] \left[ \begin{matrix} r_1 \\ r_2 \\ r_3 \end{matrix} \right]$$.

By using the fact that the plane defuse it is still planar. You need 3D variations in your setup so that the linined by $r_3$ intersects the plane at infinity at a line, we can see that the other rotational components are two particular points on that line. If you compute the intersection of the first line with the absolute conic, you get rid of the pose parameters in the optimization.

For more information, check section 2.4, here.

  1. You cannot simply use a big chessboard. You should have 3D variations in your setup so that the system is well conditioned and degeneracies do not occur. However, you could simply use a 3D calibration rig, and capture a single image (like Tsai does).
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.