2
$\begingroup$

The question considers a very specific scenario in which we have an image with let us say, two rectangle objects. We know width and height of one object. How can we calculate the dimensions of the other object ?

Let's assume that the plane corresponding to the camera (that is the orientation at which the image is clicked) is not parallel to the surface of the objects of the image.

Ideally, I am planning to build something like this: https://www.youtube.com/watch?v=jmoPCN2NM78

I have an intuition of using trigonometric formulas to compute the projection of the image to the surface parallel to the camera. But, I could not come to any concrete and convincing conclusion.

Any kind of help for providing the formulas, research papers or detailed information on how can I achieve this, will be heartily appreciated.

$\endgroup$
3
  • $\begingroup$ Is more than one photo taken (from different positions, so that one can do photogrammetry)? Do you get range information? Do you use structured 3D lighting? You'll need to say quite a bit more about exactly what data you take, because you need to infer range information somehow to make this kind of idea work. $\endgroup$ Commented Oct 15, 2015 at 13:14
  • $\begingroup$ Okay. So, I will have only one photo, not more than one. I probably don't have any data as such, as the case is the same as taking a photo from a mobile phone for example. And, I am not using structured 3D lighting. I am only worried about two dimensions (height and width) I don't have any spherical objects also. I think, the link which I provided may help you understand the problem statement more. $\endgroup$ Commented Oct 15, 2015 at 13:37
  • $\begingroup$ You want the inverse of what artists do if they want correct proportions eprints.fortlewis.edu/27/1/… $\endgroup$
    – anna v
    Commented Oct 15, 2015 at 14:08

1 Answer 1

1
$\begingroup$

These kind of problems are very common in computer graphics. What I will explain only works if the distance you want to measure and your reference object are in the same plane (for example a sheet of paper on the table you want to measure).
If you knew how this plane lies in the image, that is how the points of the of the plane are mapped to points in the image, then it would be easy. Fortunately this task already has a good answer here! Let's apply this to measuring the distance of two points in the image. Take this screenshot out of your video as an example:

the table

The corners of the sheet of paper are $p_1 = (496, 255)$, $p_2 = (607, 224)$, $p_3 = (508, 171)$, and $p_4 = (405, 194)$ and the corners of the table are $q_1 = (7,244)$, $q_2 = (654, 389)$, $q_3 = (860, 47)$, and $q_4 = (511, 23)$ (all in pixel). The corners of a DIN A3 paper are $e_1 = (0, 0)$, $e_2 = (0.3, 0)$, $e_3 = (0.3, 0.42)$, and $e_4 = (0, 0.42)$ (all in m).
Now construct the transformation matrices according to the reference. For us the "destination image" is the sheet of paper, and the "source image" is the given one of the table. The transformation matrices are: $$A=\left( \begin{array}{ccc} -400.027 & 530.049 & 377.977 \\ -134.686 & 172.899 & 212.787 \\ -0.806505 & 0.873228 & 0.933278 \\ \end{array} \right),$$ $$B=\left( \begin{array}{ccc} 0. & 0.3 & 0. \\ 0. & 0. & 0.42 \\ -1. & 1. & 1. \\ \end{array} \right),$$ $$\text{and } C=B A^{-1}=\left( \begin{array}{ccc} 0.00218481 & 0.00325931 & -1.62797 \\ -0.00145442 & 0.00520777 & -0.148304 \\ -0.0000581739 & -0.00284785 & 1.74436 \\ \end{array} \right).$$ The matrix C transforms from pixel coordinates to "sheet of paper" coordinates. Pay attention to the fact that we are using homogeneous coordinates. To transform the table corners to sheet coordinates we first apply $C$: $C (q_1,1)=(-1.03252, 0.768499, 1.23704)$, $C (q_2,1)=(-0.0915473, -0.927641, 1.61234)$, $C (q_3,1)=(1.47321, 0.553807, 0.62639)$, and $C (q_4,1)=(0.788933, 1.18639, 0.578344)$. And then perform the dehomogenization to obtain $q_1''=(-0.834672, 0.621241)$, $q_2''=(-0.0567792, -0.575339)$, $q_3''=(2.35191, 0.884125)$, and $q_4''=(1.36412, 2.05135)$. From this we can directly calculate the distances (in m) and find $d(q_1'',q_2'')=1.43$ (should be $1.53$), $d(q_2'',q_3'')=2.82$ (should be $2.73$), $d(q_3'',q_4'')=1.53$ (should be $1.53$), and $d(q_4'',q_1'')=2.62$ (should be $2.73$). These values are not perfect. Possible reasons are that I didn't pick the points in the image accurately enough. But, as was pointed out in the video, it is crucial that you choose the reference points as accurately as possible. Playing around with those values shows that they have a huge effect on the final result. This method also does not take lens distortion into account.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.