0
$\begingroup$

I found the derivation for two point and three point perspective here on this site, even though it says review on one point perspective, it doesn't give a link to previous pages, but I'd like to know how the matrix for one point perspective was derived. If possible please give me a detailed derivation for this projection matrix.

$\endgroup$

1 Answer 1

0
$\begingroup$

When reading on transformation matrices over a range of distinct source material, there are some main concepts that you must understand, which are as follows.

Column vs Row Notation

Some authors write vectors as a single row matrix (1x3), while others write them as a single column matrix (3x1). The latter seems to be the preferred representation in most modern books on computer graphics. The main difference between the two notations is that when using the former, matrices will be multiplied on the right side of the vector, and when using the latter they will be multiplied on the left. When dealing with linear transformations, such as rotation matrices, in a particular notation, they are represented as the transposed matrix in the other notation. For instance, the rotation of point $p=(1, 1)$ for an angle $\theta$ in the plane is shown below. As you can easily see, $Row(x)^T = Column(x)$, i.e., one notation is the transpose of the other.

$$ \begin{align} Row(p) =& \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} \cos(\theta) & sin(\theta)\\ \text{-}sin(\theta) & \cos(\theta) \end{bmatrix}\\ Column(p) =& \begin{bmatrix} \cos(\theta) & \text{-}\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{align} $$

The website you linked uses row notation. This is important because if you read about the perspective matrix in other sources, you may see the result written as the transpose matrix if the author used column notation.

Basis Orientation

In the website, the camera basis vectors $X_1$, $Y_1$, and $Z_1$ form a left-handed coordinate system, because $X_1 \times Y_1 = - Z_1$. This is important as well, as you may notice some sign changes in the perspective matrix if the system was right-handed. You can read more about handedness in Wikipedia. As with row notation, this is another arbitrary decision that the author has taken.

Perspective Projection

Given the appropriate basis, and a projection plane at a distance d from the origin, perspective projection is a simple exercise in similar triangles, as shown below.

Perspective Projection Diagram)

The point $\mathbf{P} = (x,y,z)$ when projected onto the plane gives another point $\mathbf{p} = (x_p, y_p, z_p)$ that can be written as (in row notation) $$\mathbf{p} = \begin{bmatrix} d \displaystyle \frac{x}{z} & d \displaystyle \frac{y}{z} & d \end{bmatrix}.$$

Now we just need to put this formula into matrix form (again, in row notation).

Homogeneous/Projective Coordinates

If we just use the raw 3D coordinates of $\bf P$ there is no way to represent the formula for $\bf p$ as a matrix multiplication. The use of homogeneous coordinates allows us exactly to do that. We start by augmenting $\bf P$ onto the fourth dimension, by defining it as $\mathbf{P} = (x, y, z, 1)$, and we use 4x4 homogeneous matrices instead of 3x3 matrices to represent transformations. I strongly urge you to read on projective spaces, as otherwise the following explanation will look like too much hand-waving as I am not going to dwell on mathematical formalities.

Now, in this new space every point with coordinates $(wx,wy,wz,w)$ can be mapped onto the point $(x,y,z,1)$ by a simple division by w. We can use this property in our advantage to fill-out a matrix in such a way that after the division by w is made, we get exactly the perspective projection formula for our point.

$$ \begin{bmatrix} x & y & z & \displaystyle\frac{z}{d} \end{bmatrix} = \begin{bmatrix} x & y & z & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & \displaystyle \frac{1}{d}\\ 0 & 0 & 0 & 0 \end{bmatrix} $$

Now if we take the point $(x, y, z, \frac{z}{d})$ and we divide it by its last coordinate, we get $(d\displaystyle\frac{x}{z}, d\displaystyle\frac{y}{z}, d, 1)$, which are exactly the coordinates of our projected point if we just ignore the augmentation onto fourth dimensional space that we did (i.e., discard the last coordinate).

As you can see, the "perspective matrix" does not actually do any perspective transformation, it just "prepares" the transformation by coding a specific value in the last coordinate, such that after all coordinates are divided by it, we get the true projection.

$\endgroup$
6
  • $\begingroup$ If I multiply the two matrices simply I get $\begin{bmatrix} x&& y&&z&&\frac{z}{d}+1\end{bmatrix}$ $\endgroup$ Commented Nov 14, 2018 at 2:11
  • 1
    $\begingroup$ Sorry for the mistake! I typed the identity matrix in the editor and just changed it from there. I forgot to change the one to a zero in the last row. I edited the answer with the correction. Thanks for spotting it! $\endgroup$
    – vgs
    Commented Nov 14, 2018 at 3:28
  • $\begingroup$ is this derivation the same as: this derivation that I have done in the second derivation? What about the first derivation? Is it also only for one point perspective? Or is it more general for all point perspectives? $\endgroup$ Commented Nov 14, 2018 at 5:21
  • $\begingroup$ Yes they are quite the same, but it is likely that your frame of reference has a different definition (one or more axes pointing at a different direction) and for sure you were using column instead of row notation. The first derivation is also similar, but for some reason the author decided to generalize the center point of the projection. It is usual in graphics programming to let the center be the origin of camera space. $\endgroup$
    – vgs
    Commented Nov 14, 2018 at 21:15
  • $\begingroup$ just to be sure, the projection that was being performed in the link that I gave you in the comments above, is a one point projection, even if nothing was specified, right? $\endgroup$ Commented Nov 15, 2018 at 6:39

Not the answer you're looking for? Browse other questions tagged or ask your own question.