60
$\begingroup$

Background: Many (if not all) of the transformation matrices used in $3D$ computer graphics are $4\times 4$, including the three values for $x$, $y$ and $z$, plus an additional term which usually has a value of $1$.

Given the extra computing effort required to multiply $4\times 4$ matrices instead of $3\times 3$ matrices, there must be a substantial benefit to including that extra fourth term, even though $3\times 3$ matrices should (?) be sufficient to describe points and transformations in 3D space.

Question: Why is the inclusion of a fourth term beneficial? I can guess that it makes the computations easier in some manner, but I would really like to know why that is the case.

$\endgroup$
1
  • 10
    $\begingroup$ Short answer: 3x3 describes rotation/skew/scale, etc. You need the 4x4 in order to describe translation. $\endgroup$
    – Justin L.
    Commented Jul 21, 2010 at 21:25

3 Answers 3

56
$\begingroup$

I'm going to copy my answer from Stack Overflow, which also shows why 4-component vectors (and hence 4×4 matrices) are used instead of 3-component ones.


In most 3D graphics a point is represented by a 4-component vector (x, y, z, w), where w = 1. Usual operations applied on a point include translation, scaling, rotation, reflection, skewing and combination of these.

These transformations can be represented by a mathematical object called "matrix". A matrix applies on a vector like this:

[ a b c tx ] [ x ]   [ a*x + b*y + c*z + tx*w ]
| d e f ty | | y | = | d*x + e*y + f*z + ty*w |
| g h i tz | | z |   | g*x + h*y + i*z + tz*w |
[ p q r s  ] [ w ]   [ p*x + q*y + r*z +  s*w ]

For example, scaling is represented as

[ 2 . . . ] [ x ]   [ 2x ]
| . 2 . . | | y | = | 2y |
| . . 2 . | | z |   | 2z |
[ . . . 1 ] [ 1 ]   [ 1  ]

and translation as

[ 1 . . dx ] [ x ]   [ x + dx ]
| . 1 . dy | | y | = | y + dy |
| . . 1 dz | | z |   | z + dz |
[ . . . 1  ] [ 1 ]   [   1    ]

One of the reason for the 4th component is to make a translation representable by a matrix.

The advantage of using a matrix is that multiple transformations can be combined into one via matrix multiplication.

Now, if the purpose is simply to bring translation on the table, then I'd say (x, y, z, 1) instead of (x, y, z, w) and make the last row of the matrix always [0 0 0 1], as done usually for 2D graphics. In fact, the 4-component vector will be mapped back to the normal 3-vector vector via this formula:

[ x(3D) ]   [ x / w ]
| y(3D) ] = | y / w |
[ z(3D) ]   [ z / w ]

This is called homogeneous coordinates. Allowing this makes the perspective projection expressible with a matrix too, which can again combine with all other transformations.

For example, since objects farther away should be smaller on screen, we transform the 3D coordinates into 2D using formula

x(2D) = x(3D) / (10 * z(3D))
y(2D) = y(3D) / (10 * z(3D))

Now if we apply the projection matrix

[ 1 . .  . ] [ x ]   [  x   ]
| . 1 .  . | | y | = |  y   |
| . . 1  . | | z |   |  z   |
[ . . 10 . ] [ 1 ]   [ 10*z ]

then the real 3D coordinates would become

x(3D) := x/w = x/10z
y(3D) := y/w = y/10z
z(3D) := z/w = 0.1

so we just need to chop the z-coordinate out to project to 2D.

$\endgroup$
46
$\begingroup$

Even though 3x3 matrices should (?) be sufficient to describe points and transformations in 3D space.

No, they aren't enough! Suppose you represent points in space using 3D vectors. You can transform these using 3x3 matrices. But if you examine the definition of matrix multiplication you should see immediately that multiplying a zero 3D vector by a 3x3 matrix gives you another zero vector. So simply multiplying by a 3x3 matrix can never move the origin. But translations and rotations do need to move the origin. So 3x3 matrices are not enough.

I haven't tried to explain exactly how 4x4 matrices are used. But I hope I've convinced you that 3x3 matrices aren't up to the task and that something more is needed.

$\endgroup$
3
  • 3
    $\begingroup$ Why use a 4x4 matrix rather than using a 3x3 and 3x1, or using a 3x4? $\endgroup$ Commented Jul 21, 2010 at 15:52
  • 1
    $\begingroup$ Actually, for rotations and translations, 3x4 is fine. 3x4 is just a way to collect 3x3 and 3x1 together in one array. $\endgroup$
    – Dan Piponi
    Commented Jul 21, 2010 at 15:59
  • 1
    $\begingroup$ +1 for the minimal explanation! $\endgroup$ Commented Feb 23, 2012 at 8:14
13
$\begingroup$

To follow up user80's answer, you want to get transformations of the form v --> Av + b, where A is a 3 by 3 matrix (the linear part of transformation) and b is a 3-vector. We can encode this transformation in a 4 x 4 matrix by putting A in the top left with three 0's below it and making the last column be (b,1). Multiplying the 4-vector (v,1) with this matrix will give you (Av + b, 1).

$\endgroup$
1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .