This is a long topic to fully explain. I will try to write shortly, so please excuse the brevity.
Standard computer vision projection (ignoring distortion like Houdini) follows:
$$
\mathbf{x} = \lambda \mathbf{K}[\mathbf{R}\mathbf{X} +\mathbf{t} ]
$$
$\mathbf{R}$ is a $3x3$ orthogonal matrix, $\mathbf{t}$ is a $3x1$ translation vector. Camera position $\mathbf{C}$ is given by $\mathbf{C}=-\mathbf{R}\mathbf{t}$. $\lambda = x_z$ is a scale factor used for de-homogenization. The projection can also be compactified into a $4x3$ matrix:
$$
\begin{align}
\mathbf{x} &= \lambda\mathbf{P}[\mathbf{X}^T 1]^T\\
\mathbf{P} &= \mathbf{K}[\mathbf{R} | \mathbf{t}]
\end{align}
$$
Rotation is sometimes represented as a Rodrigues vector. To simply convert from one representation to another you might use cv::Rodrigues. Sometimes, (for some unknown reason!) people can also choose Euler angles for rotation representation. You should check which one Houdini uses, hopefully Rodrigues - if not you should compose $\mathbf{R}$ from 3 angles. To obtain $\mathbf{t}$ explicitly, just plug in $\mathbf{C}$ and $\mathbf{R}$ into $\mathbf{C}=-\mathbf{R}\mathbf{t}$ to solve for $\mathbf{t}$.
The only remaining unknown is then the $3x3$ camera instrinsic matrix, $\mathbf{K}$. In computer vision:
$$
\mathbf{K} =
\begin{bmatrix}
f_x & s & c_x \\
0 & f_y & c_y \\
0 & 0 & 1 \\
\end{bmatrix}
$$
The principal point $(c_x, c_y)$ is sometimes initialized to be $(\frac{w}{2},\frac{h}{2})$, which you can obtain from image size. Usually $s=0$. $f$ is just a bit more complicated:
Standard vision libraries (e.g. OpenCV) store the focal length in pixels, as in $\mathbf{K}$. In general, the relation to physical focal length is:
$$
f_{in-pixels} = \frac{w * f_{mm}}{w_{CCD-in-mm}}
$$
where $w_{CCD-in-mm}$ is the physical sensor size. In your case, I suspect that the aperture is $w_{CCD-in-mm}$, and focal length is the same as $f_{mm}$. If you plug these values you will get the focal length in pixels: $f=f_{in-pixels}$. Not sure about your units though.
Moreover, I assume that with $1$ x $aperture$ $= 41.4214$ you refer to a camera with square pixels with aspect ratio 1. Therefore it will not affect if you do the same calculation using height instead of width: $f_x=f_y=f$.
I hope this clarifies a bit. Let's keep refining this answer if you provide more info.