2
\$\begingroup\$

As an exercise in trying to learn 3D math, I've been experimenting with hit-testing by casting a ray from 2D screen space into 3D world space. My first attempt was as follows:

Matrix m = Matrix.Invert(
    this.scene.Camera.ViewMatrix *
    this.scene.Camera.ProjectionMatrix *
    homogeneousToViewportTransform());
Vector3 rayOrigin = Vector3.TransformCoordinate(new Vector3(p.X, p.Y, 0), m);
Vector3 rayVector = Vector3.TransformCoordinate(new Vector3(0, 0, -1), m);

This gets the ray origin correct, but not the ray vector. My reasoning was that the vector [0, 0, -1] is facing into the screen when in screen space, so transforming that into world space would give the correct vector. But that doesn't seem to be the case. Looking around, I've found that the correct way to calculate this is by doing:

Vector3 rayVector = new Vector3(
    m.M31 - (m.M34 * rayOrigin.X),
    m.M32 - (m.M34 * rayOrigin.Y),
    m.M33 - (m.M34 * rayOrigin.Z));
rayVector.Normalize();

Can anyone explain what's wrong with my original reasoning, and what the actual algorithm for calculating the ray vector is doing?

\$\endgroup\$
3
  • \$\begingroup\$ Groky- Check out my answer to this question and the ray casting code I use: gamedev.stackexchange.com/questions/12360/… It's written in java, but you should be able to tell what's happening. \$\endgroup\$
    – House
    Commented Jul 2, 2011 at 0:16
  • \$\begingroup\$ Thanks Byte56 - yes I've seen quite a bit of code that does similar. My problem is in understanding why you need to do that and can't just translate a [0,0,-1] vector into world space... \$\endgroup\$
    – Groky
    Commented Jul 2, 2011 at 0:40
  • \$\begingroup\$ I might add that the purpose (and problem) of my exercise is to understand the maths... My maths brain is bad :( \$\endgroup\$
    – Groky
    Commented Jul 2, 2011 at 0:42

4 Answers 4

6
\$\begingroup\$

EDIT

Please note that the following answer does not take into account several things, and is more of an attempt to visualize differences in picking between parallel and perspective projection, and probably not a great one at that. Please read all the comments for more details.

-----------------------

I believe that your original reasoning would be correct if you were using parallel projection. But since (I believe) you're using perspective projection, it doesn't work.

Imagine you have two planes that are perpendicular to the camera (both facing the camera). The green plane is closest and the red plane behind that. From the camera view it looks like this:

Camera perspective view

Now when I cast a ray out to the black dot, to do so correctly in perspective projection, I have to cast from where my camera is (which is the sharp point on the left) out to the dot. That's what the extra math does when creating the vector for the ray.

View of perspective camera

If I were to do the same in parallel projection, I can pretend that my camera is not a pin-point, but a huge plane. Here is where I can use the vector[0,0,-1]:

view of parallel camera

Here's the same view as the top image, but in parallel projection. enter image description here

Anyway I'm pretty sure that's your issue. It's not a mathematical answer, but I hope it helps you understand the difference in what you were thinking and how it's supposed to work. There is of course the Math SE that you could get more details on the math.

\$\endgroup\$
15
  • 1
    \$\begingroup\$ The same problem exists with parallel projection, assuming that homogenous coordinates are used to support a camera not always located at the origin (i.e., the normal parallel projection case). \$\endgroup\$
    – Crowley9
    Commented Jul 2, 2011 at 3:56
  • \$\begingroup\$ Thanks! That explains it nicely. The pictures really helped, especially when you're as slow with maths as I am :) \$\endgroup\$
    – Groky
    Commented Jul 2, 2011 at 4:29
  • \$\begingroup\$ I'm not sure how it can explain it nicely given that it is completely wrong. \$\endgroup\$
    – Crowley9
    Commented Jul 2, 2011 at 16:14
  • \$\begingroup\$ I'm finding so many nice people on these forums. @Crowley9, tell us WHY it's wrong, and I'll update my answer so that others may benefit from your clearly superior knowledge of the situation. \$\endgroup\$
    – House
    Commented Jul 2, 2011 at 17:28
  • 1
    \$\begingroup\$ I didn't mean to offend, but it is wrong, yet the question poser marked it as the accepted answer anyway. I explained the phenomenon in my answer below. To be more specific he is transforming two points (the origin O and a POINT on the far plane F=(0,0,-1), NOT a vector). The vector between the eye and the point on the far plane is M(F)-M(O) (which results in the correct maths the poster cited), where he is just using M(F). This is true for any matrix that moves the origin (i.e., any general camera that is not centered at the origin), even if the matrix is that of an orthographic camera. \$\endgroup\$
    – Crowley9
    Commented Jul 2, 2011 at 22:52
2
\$\begingroup\$

Assuming you have a good understanding of how a point is transformed, the difference is that a vector is always defined as the vector to a point relative to some point of reference (e.g., the origin). When you transform the point to which a vector points, the point of reference is not transformed with it, so the vector changes and you get incorrect results.

To transform the vector v by matrix M as you would expect, transform M(v), and the point of reference/origin M((0,0,0,1)) (i.e., the - vector). Then compute M(v) - M((0,0,0,1)) to get the correctly transformed vector. Obviously, transforming by M((0,0,0,1)) zeroes out most of the components of a matrix, so you land up with just the translation components remaining and the matrix homogenous component, which with some minor math transforms into the code you have as a fix.

\$\endgroup\$
1
  • \$\begingroup\$ Thanks - I need to read this a few more times before I understand it, but think it explains the reason the working code works. \$\endgroup\$
    – Groky
    Commented Jul 2, 2011 at 4:34
0
\$\begingroup\$

Perhaps this would be of use? It's (probably the best) article on mouse picking.

\$\endgroup\$
1
  • \$\begingroup\$ That link is bad. \$\endgroup\$
    – House
    Commented Apr 12, 2012 at 21:28
0
\$\begingroup\$

Long ago I developed an multiprocessor raytracing renderer for an old supecomputer in my college.

I addressed this problem by defining an observation point and the screen plane in world coordinate. The screen was defined as the coordinate for the upper-left pixel in world coordinate and the U, V direction pointing toward the left and lower pixel respectively.

Screen coordinate system

In this way the pixel (x,y) is centered on P + x·U + y·V.

If the view point is W then you can cast a ray W + (P + x·U + y·V)·t that is in world coordinate.

I think that in your case the trick is to put the screen in world coordinate i.e. find the coordinate for the top-left, top-right and bottom-left pixels. With those vectors and the world coordinate view point you can cast a ray through any pixel and rely in a world coordinate parametric formulation.

\$\endgroup\$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .