3
$\begingroup$

With perspective projection we can unproject screen space coordinates of the cursor to the near and far plane of the frustum and calculate the direction of the ray through the cursor.

        Vector4 cScreen0 = Vector4(cursorNormX, cursorNormY, -1, 1);
        Vector4 cView0 = Inverse(projection)*cScreen0;
        cView0 = cView0*(1/cView0.w);
        Vector4 cWorld0 = Inverse(view) * cView0;

        Vector4 cScreen1 = Vector4(cursorNormX, cursorNormY, 1, 1);
        Vector4 cView1 = Inverse(projection)*cScreen1;
        cView1 = cView1*(1/cView1.w);
        Vector4 cWorld1 = Inverse(view) * cView1;

        Vector3 cursorDir = normalize(cWorld1.xyz()-cWorld0.xyz());

Now however with orthographic projection the far and near plane are of the same size, so we can't calculate the direction of the cursor this way. The direction is going to be equal the world's z axis. (I haven't had much sleep so I hope this makes sense).

So instead I calculated the cursor position by unprojecting the cursor with zeroed out z value. We can calculate the x and y coordinate and set the z coordinate later as we like.

        Vector4 cScreen = Vector4(cursorNormX, cursorNormY, 0, 0);
        Vector4 cView = Inverse(projection)*cScreen;
        cView = Vector4(cView.x, cView.y, 0, 0);
        Vector4 cWorld = Inverse(View) * cView;
        cWorld = Vector4(cWorld.x, cWorld.y, 0, 0);

        Vector3 cursorPos = cWorld.xyz();

However I'm not getting the correct results from the projection. What am I missing?

The purpose of this is to be able to cast rays in the direction of the cursor.

$\endgroup$
4
  • 1
    $\begingroup$ I don't really get what exactly you are asking. "How does it work" or "where is the error in my code"? Also, what results do you expect and which ones do you get? In an orthographic projection, all rays are parallel to the cameras' viewing direction. So you basically just need to find the position on the near-plane that corresponds to your cursor position and fire a ray into the cameras view direction. $\endgroup$
    – wychmaster
    Commented Jun 19, 2020 at 10:18
  • $\begingroup$ @wychmaster "So you basically just need to find the position on the near-plane that corresponds to your cursor position" That's what I tried to do in the second code snippet. The results I'm getting is that the calculated cursor position doesn't match the position of the cursor. $\endgroup$ Commented Jun 19, 2020 at 10:22
  • $\begingroup$ Have a look into my answer in this question It is basically the same problem, just that you are looking for a point on the near plane (z=-1) and you use a different projection. As far as I can see, the vector cScreen in your second code snipped needs a w component of 1 and not 0. Don't know if that already fixes the problem. You might also need to divide the result vector cView by w $\endgroup$
    – wychmaster
    Commented Jun 19, 2020 at 11:42
  • $\begingroup$ @wychmaster Appreciate it. Unfortunately changing w or dividing by w gives the same result. With my implementation x and y coordinates do get offset in the direction of the cursor but by much larger amount. So x and y maybe need to be divided by something. Dividing by w doesn't do anything though. $\endgroup$ Commented Jun 19, 2020 at 13:47

1 Answer 1

4
$\begingroup$

I am still not 100% sure, if I understood your question, because of this sentence:

Now however with orthographic projection the far and near plane are of the same size, so we can't calculate the direction of the cursor this way. The direction is going to be equal the world's z axis. (I haven't had much sleep so I hope this makes sense).

If I misunderstood you, let me know in the comments and I will adjust or remove my answer.

However, if I understood your intention correctly and you want to cast a ray through your frustum (for example to pick objects), then your statement is wrong. The direction will be equal to the view spaces' negative z-direction, not the world spaces'. So all you need to do is to transform your direction vector or the near and far plane points to world space. To proof, that this works, I have implemented everything in a Python script, that you will find at the end of this answer. If you have a Python interpreter with MatPlotLib and NumPy installed, you can modify the setup parameters and experiment a little bit yourself.

So let's have a look at the relevant implementation. First, we calculate the mouse position in clip space and the corresponding 2 points on the near and far plane.

mouse_pos_x_clip = mouse_pos_x_screen / screen_width * 2 - 1
mouse_pos_y_clip = mouse_pos_y_screen / screen_height * 2 - 1

mouse_pos_near_clip = np.array([mouse_pos_x_clip, mouse_pos_y_clip, -1, 1], dtype=float)
mouse_pos_far_clip = np.array([mouse_pos_x_clip, mouse_pos_y_clip, 1, 1], dtype=float)

Now we get the involved matrices. My notation here is as follows: I use two characters after M_ that are abbreviations of the involved spaces. The first character is the source and the second the target space. The characters are c for clip space, v for view space, and w for world space. So M_vc is the view space to clip space transformation a.k.a the projection matrix.

M_wv = get_world_to_view_matrix(camera_pitch, camera_yaw, camera_position)
if perspective:
    M_vc = get_perspective_mat(field_of_view, z_near_plane, z_far_plane, aspect_ratio)
else:
    M_vc = get_orthogonal_mat(frustum_width, frustum_height, z_near_plane, z_far_plane)

M_vw = np.linalg.inv(M_wv)
M_cv = np.linalg.inv(M_vc)

Now I simply use the correct transformation matrices to transform from clip to world space. Note that the perspective projection needs a division by w after the transformation to view space. This is not necessary for the orthographic projection, but performing it does not affect the result.

mouse_pos_near_view = np.matmul(M_cv, mouse_pos_near_clip)
mouse_pos_far_view = np.matmul(M_cv, mouse_pos_far_clip)

if perspective:
    mouse_pos_near_view= mouse_pos_near_view / mouse_pos_near_view[3]
    mouse_pos_far_view= mouse_pos_far_view / mouse_pos_far_view[3]

mouse_pos_near_world = np.matmul(M_vw, mouse_pos_near_view)
mouse_pos_far_world = np.matmul(M_vw, mouse_pos_far_view)

This is, as far as I can see, identical to your first code section. Now let's have a look a the result for perspective and orthographic projection with the following setup parameters:

screen_height = 1080
screen_width = 1980

mouse_pos_x_screen = 500
mouse_pos_y_screen = 300

camera_position = [3, 0, 1]
camera_yaw = 20
camera_pitch = 30

z_near_plane = 0.5
z_far_plane = 3

# only orthogonal
frustum_width = 3
frustum_height = 2

# only perspective
field_of_view = 70
aspect_ratio = screen_width / screen_height

The screen space and clip space values are identical for both projections:

screen space clip space

The red line connects the two points on the near and far plane. The red dot is the point on the near plane, which is your "screen" in a 3d space. The green lines mark the borders of the frustum. In clip space, it is obviously just a cube. An important thing to realize is, that clip space is defined in a left-handed coordinate system while the other coordinate systems are usually right-handed (have a look at the images in this link). I mention it since I had some problems with the plots until I realized that.

Now for perspective projection, I get the following plots:

view space perspective world space perspective

The blue dot is the camera position. If I just exchange the perspective matrix with an orthographic projection matrix, the results look like this:

view space orthographic world space orthographic

As you can see, the approach you used in your first code section works independently of the chosen projection. I don't know why you thought it wouldn't. An assumption of mine is, that you might have made a small mistake during the implementation of the orthographic projection matrix. For example, if you accidentally flipped rows and columns (transposed) of the orthographic projection matrix, you get total crap like this:

transposed orthographic projection

I know this looks like a wrong implementation of the perspective projection, but this is what I get when I transpose the orthographic projection matrix before the multiplication.

So make sure you use the correct orthographic projection matrix (source):

$$ \begin{bmatrix} \frac{2}{w}&0&0&0\\ 0&\frac{2}{h}&0&0\\ 0&0&\frac{-2}{f-n}&-\frac{f+n}{f-n}\\ 0&0&0&1 \end{bmatrix} $$

Here $w$ is the frustum width, $h$ the frustum height, $f$ the far plane z-value and $n$ the near plane z-value. This is the representation if you use column vectors and left multiplied matrices. For row vectors and right multiplied matrices, you need to transpose it.

Your second approach:

Vector4 cScreen = Vector4(cursorNormX, cursorNormY, 0, 0);
Vector4 cView = Inverse(projection)*cScreen;
cView = Vector4(cView.x, cView.y, 0, 0);
Vector4 cWorld = Inverse(View) * cView;
cWorld = Vector4(cWorld.x, cWorld.y, 0, 0);

Vector3 cursorPos = cWorld.xyz();

has multiple issues and all are related to the z- and w-components of your vectors. Basically, you need to do the same transformations as in your first approach. So use Vector4 cScreen = Vector4(cursorNormX, cursorNormY, -1, 1); as initial vector.

One problem of the line cView = Vector4(cView.x, cView.y, 0, 0); is, that your z-component should be identical to your near plane value and not zero. You might get away with that since it would just shift your point a little bit in the camera viewing direction in world space, but more problematic is that you set w to 0. This makes it impossible to apply any translation to the vector by $4 \times 4$ matrix multiplication. So when you transform to world space, you will always end up with a point that treats the camera to be located at the coordinate system origin, regardless of its true position. So you need to set the w-component to 1. However, if the previous lines are correct, you should automatically get the correct z- and w-values which makes this line obsolete.

Lastly, the line cWorld = Vector4(cWorld.x, cWorld.y, 0, 0); doesn't make much sense either to me. Your camera is somewhere in 3d world space. Why do you remove the z-component you previously calculated? With this, you move the point into the XY-plane for no reason. Just remove this line.

To get the cameras viewing direction without using the far plane point, just multiply the vector [0, 0, -1, 0] with the view-to-world matrix (M_vw). In this case, the w-component really has to be 0, since you do not want to apply translations to it because it is a direction vector and no point. The z-component needs to be -1 because the camera looks into the negative direction by definition. Notice, that the transformed vector is usually not of unit length anymore. So you might want to normalize it.

Additional Note

For an orthographic projection, there is no need to calculate the inverse projection matrix. You can simply calculate the x and y values directly by something like this (untested pseudo-code):

x_view = (x_screen / screen_width - 0.5) * frustum_width
y_view = (y_screen / screen_height - 0.5) * frustum_height

Then you get the screen space point in view space by setting (untested pseudo-code):

point_view_near = [x_view, y_view, -z_near, 1]

Watch out to use the negative near plane z-value! For the far plane, you can do the same.

Full Python script

import numpy as np
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401 unused import
import matplotlib.pyplot as plt

# setup --------------------------------------------------------------------------------

screen_height = 1080
screen_width = 1980

mouse_pos_x_screen = 500
mouse_pos_y_screen = 300


camera_position = [3, 0, 1]
camera_yaw = 20
camera_pitch = 30

# ----------------
# projection setup
# ----------------
perspective = False # set 'False' for orthogonal and 'True' for perspective projection

z_near_plane = 0.5
z_far_plane = 3

# only orthogonal
frustum_width = 3
frustum_height = 2

# only perspective
field_of_view = 70
aspect_ratio = screen_width / screen_height

# functions ----------------------------------------------------------------------------


def render_frustum(points, camera_pos, ax, right_handed=True):
    line_indices = [
        [0, 1],
        [0, 2],
        [0, 4],
        [1, 3],
        [1, 5],
        [2, 3],
        [2, 6],
        [3, 7],
        [4, 5],
        [4, 6],
        [5, 7],
        [6, 7],
    ]
    for idx_pair in line_indices:
        line = np.transpose([points[idx_pair[0]], points[idx_pair[1]]])
        ax.plot(line[2], line[0], line[1], "g")
    if right_handed:
        ax.set_xlim([-5, 5])
    else:
        ax.set_xlim([5, -5])
    ax.set_ylim([-5, 5])
    ax.set_zlim([-5, 5])
    ax.set_xlabel("z")
    ax.set_ylabel("x")
    ax.set_zlabel("y")
    ax.plot([-5, 5], [0, 0], [0, 0], "k")
    ax.plot([0, 0], [-5, 5], [0, 0], "k")
    ax.plot([0, 0], [0, 0], [-5, 5], "k")
    if camera_pos is not None:
        ax.scatter(
            camera_pos[2], camera_pos[0], camera_pos[1], marker="o", color="b", s=30
        )

def render_ray(p0,p1,ax):
    ax.plot([p0[2], p1[2]], [p0[0], p1[0]], [p0[1], p1[1]], color="r")
    ax.scatter(p0[2], p0[0], p0[1], marker="o", color="r")


def get_perspective_mat(fov_deg, z_near, z_far, aspect_ratio):
    fov_rad = fov_deg * np.pi / 180
    f = 1 / np.tan(fov_rad / 2)

    return np.array(
        [
            [f / aspect_ratio, 0, 0, 0],
            [0, f, 0, 0],
            [
                0,
                0,
                (z_far + z_near) / (z_near - z_far),
                2 * z_far * z_near / (z_near - z_far),
            ],
            [0, 0, -1, 0],
        ]
    )


def get_orthogonal_mat(width, height, z_near, z_far):
    r = width / 2
    t = height / 2

    return np.array(
        [
            [1 / r, 0, 0, 0],
            [0, 1 / t, 0, 0],
            [
                0,
                0,
                -2 / (z_far - z_near),
                -(z_far + z_near) / (z_far - z_near),
            ],
            [0, 0, 0, 1],
        ]
    )


def get_rotation_mat_x(angle_rad):
    s = np.sin(angle_rad)
    c = np.cos(angle_rad)
    return np.array(
        [[1, 0, 0, 0], [0, c, -s, 0], [0, s, c, 0], [0, 0, 0, 1]], dtype=float
    )


def get_rotation_mat_y(angle_rad):
    s = np.sin(angle_rad)
    c = np.cos(angle_rad)
    return np.array(
        [[c, 0, s, 0], [0, 1, 0, 0], [-s, 0, c, 0], [0, 0, 0, 1]], dtype=float
    )


def get_translation_mat(position):
    return np.array(
        [
            [1, 0, 0, position[0]],
            [0, 1, 0, position[1]],
            [0, 0, 1, position[2]],
            [0, 0, 0, 1],
        ],
        dtype=float,
    )


def get_world_to_view_matrix(pitch_deg, yaw_deg, position):
    pitch_rad = np.pi / 180 * pitch_deg
    yaw_rad = np.pi / 180 * yaw_deg

    orientation_mat = np.matmul(
        get_rotation_mat_x(-pitch_rad), get_rotation_mat_y(-yaw_rad)
    )
    translation_mat = get_translation_mat(-1 * np.array(position, dtype=float))
    return np.matmul(orientation_mat, translation_mat)


# script -------------------------------------------------------------------------------

mouse_pos_x_clip = mouse_pos_x_screen / screen_width * 2 - 1
mouse_pos_y_clip = mouse_pos_y_screen / screen_height * 2 - 1

mouse_pos_near_clip = np.array([mouse_pos_x_clip, mouse_pos_y_clip, -1, 1], dtype=float)
mouse_pos_far_clip = np.array([mouse_pos_x_clip, mouse_pos_y_clip, 1, 1], dtype=float)



M_wv = get_world_to_view_matrix(camera_pitch, camera_yaw, camera_position)
if perspective:
    M_vc = get_perspective_mat(field_of_view, z_near_plane, z_far_plane, aspect_ratio)
else:
    M_vc = get_orthogonal_mat(frustum_width, frustum_height, z_near_plane, z_far_plane)

M_vw = np.linalg.inv(M_wv)
M_cv = np.linalg.inv(M_vc)

mouse_pos_near_view = np.matmul(M_cv,mouse_pos_near_clip)
mouse_pos_far_view = np.matmul(M_cv,mouse_pos_far_clip)

if perspective:
    mouse_pos_near_view= mouse_pos_near_view / mouse_pos_near_view[3]
    mouse_pos_far_view= mouse_pos_far_view / mouse_pos_far_view[3]

mouse_pos_near_world = np.matmul(M_vw, mouse_pos_near_view)
mouse_pos_far_world = np.matmul(M_vw, mouse_pos_far_view)

# calculate view frustum ---------------------------------------------------------------

points_clip = np.array(
    [
        [-1, -1, -1, 1],
        [ 1, -1, -1, 1],
        [-1,  1, -1, 1],
        [ 1,  1, -1, 1],
        [-1, -1,  1, 1],
        [ 1, -1,  1, 1],
        [-1,  1,  1, 1],
        [ 1,  1,  1, 1],
    ],
    dtype=float,
)

points_view = []
points_world = []
for i in range(8):
    points_view.append(np.matmul(M_cv, points_clip[i]))
    points_view[i] = points_view[i] / points_view[i][3]
    points_world.append(np.matmul(M_vw, points_view[i]))


# plot everything ----------------------------------------------------------------------

plt.figure()
plt.plot(mouse_pos_x_screen,mouse_pos_y_screen, marker="o", color="r")
plt.xlim([0, screen_width])
plt.ylim([0, screen_height])
plt.xlabel("x")
plt.ylabel("y")
plt.title("screen space")

plt.figure()
ax_clip_space = plt.gca(projection="3d")
render_ray(mouse_pos_near_clip, mouse_pos_far_clip, ax_clip_space)
render_frustum(points=points_clip, camera_pos=None, ax=ax_clip_space, right_handed=False)
ax_clip_space.set_title("clip space")

plt.figure()
ax_view = plt.gca(projection="3d")
render_ray(mouse_pos_near_view, mouse_pos_far_view, ax_view)
render_frustum(points=points_view, camera_pos=[0, 0, 0], ax=ax_view)
ax_view.set_title("view space")

plt.figure()
ax_world = plt.gca(projection="3d")
render_ray(mouse_pos_near_world, mouse_pos_far_world, ax_world)
render_frustum(points=points_world, camera_pos=camera_position, ax=ax_world)
ax_world.set_title("world space")

plt.show()
$\endgroup$
1
  • $\begingroup$ Sir can you pl help me with this math problem of camera math.stackexchange.com/questions/4068296/… I am dragging camera with mouse pointer , (using orthographic.) Now when the camera is rotated object is clipping may be because orthographic box will rotate alos rotate . $\endgroup$ Commented Sep 7, 2021 at 6:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.