In OpenGL parlance, the projection matrix, whether orthographic or perspective, takes you from View Space (a.k.a. Camera Space) to Homogenous Clip Space. After clipping you arrive at Normalized Device Coordinates (also known as NDC) and these points are still 3D $(x_p, y_p, z_{vp})$ and finally it's the viewport transform that transforms NDC to Viewport coordinates (a.k.a. screen coordinates, window coordinates or (not normalized) device coordinates) which are 2D $(x_p, y_p)$.
May I suggest reading: OpenGL Transformation
And with respect to your final question: No, perspective "projectors" (projected rays) are not all perpendicular to the viewing plane - that would require a curved viewing surface. If all the rays projected from a point were perpendicular to the viewing surface, that surface would be a section of a sphere. There is one line that passes through the center of projection and meets the image plane at a right angle and that is the axis of projection arriving at the viewing plane at the principal point.
More info:
Transformation just means change. With respect to 3D points it usually means the output bears some resemblance to the input. e.g.: Translation, Rotation, Scale, Skew, or even Projection is a kind of transformation. If the output can be transformed back to the input then the transformation is invertible. Projection is a specific kind of transformation that moves all the input points onto a plane. Either to the closest point (orthographic) or along a lines passing through a common projection point (perspective). The projection transformation is not invertible.
A transformation is said to be linear if it can be represented by a matrix multiply.
Here's the confusing part. To perform the perspective transformation, one must perform a non-linear step of dividing by Z. This can't be represented by a matrix. To get around this, we transform all the coordinates linearly into what is called clip space. And then the pipeline includes a non-linear transformation from clip space to normalized device coordinates by performing the division operation. The so-called PROJECTION MATRIX (in OpenGL) is actually just a linear transformation from camera/eye coordinates to clip coordinates. It's only the first of two steps required to perform an actual projection transformation. The actual projection transformation is completed in a second step by doing the divide, and arriving at NDC coordinates. This divide cannot be represented by a matrix multiply. The linear part of the transformation produces the numerator and the denominator of the quotient in separate coordinates. The non-linear part of the transformation divides.
Interestingly, the linear transformation from camera coordinates to clip coordinates (as given by the PROJECTION MATRIX) is invertible. And after dividing by Z it's still invertible (provided we retain the Z value). But after we discard the Z-value and go to just Screen coordinates, we can no longer invert the 2D screen-coordinates back into the original 3D coordinates. Similarly, if we actually transformed all the points onto the projection plane (still in 3D), we wouldn't be able to invert that transformation of points on the plane back into the original 3D points (information has been lost, because it's been scaled down to 0 in the "depth" direction).
Even after dividing, we're still in 3-dimensional coordinates. The third coordinate is still useful for writing to your depth buffer.
Only when we are ready to compute screen coordinates do we drop the Z value and use just 2-dimensional coordinates.
Other transformations such as perspective texture mapping (like to simulate a projector) follow a similar path and have a non-linear divide step to arrive at texture mapping coordinates (u,v).
So, to answer your question, in the pipeline we try to keep invertible 3D coordinates around as long as possible in the pipeline, deferring the divide as late as possible and discarding the Z information as late as possible so that things are linear for as much of the pipeline as possible (thanks to homogenous coordinates) and are invertible as long as possible (by retaining the depth coordinate in NDC space).