This question cannot be answered in the form in which you asked it both because of the limits of current neuroscientific theories and methodologies when it comes to determining the structures of complex neural representations (although we have made headway in a few cases such as place cells and grid cells), and because neural representations are not really analogous to our colloquial concepts of simple mappings from numbers to images (ex. pixel, vector, wavelet, etc. bases for digital image representation).
I would encourage you to think, not only in terms of these representational formats privileged by Von Neumann style computer architectures (i.e. normal computers) with separate processing and digital memory systems, but in terms of a neural computer with distributed computation and memory. Neuroscientists typically think of neural representation in visual perception as occurring at a number of stages of increased abstraction. At the retina, the image is represented roughly in terms of what you might think of as pixels, but it is really just a set of photosensors and attached neurons that are activated due to light hitting the retina in different places at different frequencies and amplitudes. In V1 or primary visual cortex, the representation is in terms of what can be thought of as "edge detectors." In V2, the cells are tuned for a variety of slightly more complex properties such as such as orientation, spatial frequency, color, and binocular disparity (an important piece of information for 3D perception.) As you go up through V3, V4, V5, and V6, the representations becomes more and more complex until they contain information about the structure of the underlying concept itself.
The 3-dimensionality does not come in until the visual information from both eyes is combined with top-down information about how things tend to be in the world (which comes from other senses and experiences as well) in order to infer the likely shape of the object. It is unclear whether there is any place in the brain where an image is represented explicitly as a 3D model of a visual object. More likely the raw perceptual data such as colors and patterns is represented in one area and is associated with the more object-centric structural representation in another area and this association causes the co-activation of all the neurons relevant to the perception of the object in it's three dimensional form. The information is neither represented in one area of the brain nor in one scheme but is distributed across a large number of neurons with different degrees of activation.
We can also think of this in terms of the contemporary theory of enactive perception, which states that the brain represents information in terms of state-action-observation contingencies. In this theory, there is no "image" present at all, but only a set of neural activations which correlate with prior experience (where the correlations are stored in terms of associative links between neurons creating a causal activation path along which "information" travels) and relationships between potential actions (either low level motor actions of the retina or other muscles, or higher level actions such as taking a step forward) and their likely consequences (both low level perceptual consequences such as the expected changes in the sensory data when an action such as moving the head is taken and higher level perceptual and conceptual consequences such as a cup falling when you choose to open your hand and the associated experiences.) In this more comprehensive picture, visual information cannot be strictly separated from from conceptual and multi-sensory information and so three-dimensional visual information is represented both in terms of neural activations causally associated with the structure of the perceived object as well as the activations associated with all the other conceptual beliefs and perceptual expectations that relate.
If you want to read more, these sources should be a good start: distributed neural memory, distributed neural representation of a higher order process, perception as associations between actions and observations
There are also a few sources that offer direct experimental evidence towards your question, but do not provide as much of a conceptual introduction to neural computation and representation. See "Neural computations underlying depth perception", Binocular depth perception and the cerebral cortex as a start.
Your question also makes reference to rotating an object "in the mind's eye." There is also a very large body of research on this phenomenon, which is called "mental rotation." The wikipedia article is a good place to start but there is a lot more to say about this interesting phenomenon, so feel free to ask a related question if you are curious!