The Direct3D/XNA vertex transform pipeline -- how to go between model coordinates and screen coordinates, and back again!

jwatte's picture

1) WORLD matrix. This takes vertices from object-local space (0,0,0 in the middle of the object) to world space (position and orientation applied based on 0,0,0 at your "world origin" position). This is a convenient space to do normal mapped lighting and environmental reflection in.
2) VIEW matrix. This takes vertices from world space into camera space, where the camera is at 0,0,0 and "forward" is negative Z. This is a convenient space to do fogging and other depth-based effects in.
3) PROJECTION matrix. This takes vertices from camera space (where Z goes from NearPlane ... -FarPlane) and maps them to the clip space cube, where Z goes from 0 ... 1 (or some smaller range as established by the Viewport transform). The post-PROJECTION vertices is what you output from the vertex shader. However, the vertex values are still in view space like coordinates -- the "homogenous divide" which divides each value by the W component doesn't happen until rasterization.
4) VIEWPORT transform. This is more subtle. However, it takes the post-divide cube where X and Y go from -1 ... 1 and Z goes from 0 ... 1, and transforms to the viewport range (which is some sub-section of your window, or all of it). This means scaling by window width and depth buffer range.

To go "backwards," you can multiply by the inverse matrix. Thus going from a normal model or tile, where 0,0,0 is in the center of the base of the thing, to projected space, you typically do oVertex = iVertex * WORLD * VIEW * PROJECTION. When you move the object, you change WORLD. When you move the camera, you change VIEW. When you change the camera lens, you change PROJECTION. When the window resizes, you change VIEWPORT.

If you have a coordinate in window (VIEWPORT) coordinates, and you want to go to WORLD space, you have to invert the VIEWPORT transform, then invert the PROJECTION transform, then invert the VIEW transform. Typically, you assume that the "Z" value in viewport coordinates is 0, and project back to WORLD space, then you form a ray from the camera center through this point and check what you hit in the world.

Luckily, in XNA, the Viewport class proviews a convenient function for you: Viewport.UnProject(). It can be used for exactly this purpose, without you having to do all the inverse matrix math yourself!

More details can be found on MSDN, as usual:
The Direct3D Transformation Pipeline
Microsoft.Xna.Framework.Graphics.Viewport