0
$\begingroup$

From a simulation with Houdini software I retrieved these parameters:

Camera position: -0.675839, 33.5945, 0.0318854
Camera Rotation: 0.0318854, 92.4693, 0.1
Focal Length: 52.5172 mm
Image size: 1920 x 1080
Pixel Aspect Ratio: 1
x aperture = 41.4214 mm

I am not sure about how to build the system for transforming 3d coordinates of the scene to 2d image coordinates, in particular how to derive and combine rotation-translation matrix, camera-matrix transformation, and image transformation.

Do you have any suggestion? Thank you

$\endgroup$
2
  • $\begingroup$ I have been able to implement the system (thanks to @tolga for his help). Now I have a point for test with these coordinates:3d coords = [0.275501, -0.284077, 5.04747], img_coords = [1001.5, 148], but I cannot succeed to obtain the correspondence. I am not sure that rotation angles refer to a Rodrigues vector, so this could be the reason. However thanks for your help. $\endgroup$ Commented Jun 16, 2017 at 9:59
  • $\begingroup$ It also looks like angles a bit - ~91 degrees?: )) You should check this in documentation. $\endgroup$ Commented Jun 16, 2017 at 13:41

1 Answer 1

2
$\begingroup$

This is a long topic to fully explain. I will try to write shortly, so please excuse the brevity.

Standard computer vision projection (ignoring distortion like Houdini) follows:

$$ \mathbf{x} = \lambda \mathbf{K}[\mathbf{R}\mathbf{X} +\mathbf{t} ] $$

$\mathbf{R}$ is a $3x3$ orthogonal matrix, $\mathbf{t}$ is a $3x1$ translation vector. Camera position $\mathbf{C}$ is given by $\mathbf{C}=-\mathbf{R}\mathbf{t}$. $\lambda = x_z$ is a scale factor used for de-homogenization. The projection can also be compactified into a $4x3$ matrix: $$ \begin{align} \mathbf{x} &= \lambda\mathbf{P}[\mathbf{X}^T 1]^T\\ \mathbf{P} &= \mathbf{K}[\mathbf{R} | \mathbf{t}] \end{align} $$

Rotation is sometimes represented as a Rodrigues vector. To simply convert from one representation to another you might use cv::Rodrigues. Sometimes, (for some unknown reason!) people can also choose Euler angles for rotation representation. You should check which one Houdini uses, hopefully Rodrigues - if not you should compose $\mathbf{R}$ from 3 angles. To obtain $\mathbf{t}$ explicitly, just plug in $\mathbf{C}$ and $\mathbf{R}$ into $\mathbf{C}=-\mathbf{R}\mathbf{t}$ to solve for $\mathbf{t}$.

The only remaining unknown is then the $3x3$ camera instrinsic matrix, $\mathbf{K}$. In computer vision:

$$ \mathbf{K} = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \\ \end{bmatrix} $$

The principal point $(c_x, c_y)$ is sometimes initialized to be $(\frac{w}{2},\frac{h}{2})$, which you can obtain from image size. Usually $s=0$. $f$ is just a bit more complicated:

Standard vision libraries (e.g. OpenCV) store the focal length in pixels, as in $\mathbf{K}$. In general, the relation to physical focal length is: $$ f_{in-pixels} = \frac{w * f_{mm}}{w_{CCD-in-mm}} $$ where $w_{CCD-in-mm}$ is the physical sensor size. In your case, I suspect that the aperture is $w_{CCD-in-mm}$, and focal length is the same as $f_{mm}$. If you plug these values you will get the focal length in pixels: $f=f_{in-pixels}$. Not sure about your units though.

Moreover, I assume that with $1$ x $aperture$ $= 41.4214$ you refer to a camera with square pixels with aspect ratio 1. Therefore it will not affect if you do the same calculation using height instead of width: $f_x=f_y=f$.

I hope this clarifies a bit. Let's keep refining this answer if you provide more info.

$\endgroup$
1
  • $\begingroup$ yes, both aperture and focus are expressed in mm and the aspect ratio is equal to 1 (I have adjusted the question). My main doubt was referred to the computation of rotation and translation operators. Your hints are very useful (I did not know about Rodrigues vector for representing 3d rotation). I am going to try to implement the system and I will let you know. $\endgroup$ Commented Jun 16, 2017 at 7:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.