5
$\begingroup$

Adobe Illustrator has taken over five minutes (and counting) to render a vector 2D image rotated 18° in 3D on my computer. And yet, I and nearly anyone else can easily visualize the subject rotated almost instantaneously, and with little effort rotate the object continuously in real time in the mind's eye.

Spinning Flamingo

I'm not asking how the brain stores representations of objects, as that's clearly up for debate. But how does the brain structure its internal representation of 3D visual data?

It's almost definitely not in some pixel-based format, as can be shown by simply visualizing an object of some sort, and then zooming in mentally on some detail and noticing the image retains its sharpness. It's probably not breaking down objects into geometric shapes either, because at least I personally don't visualize my friends as stick figures. It could be a vector format, but then it should be easier to visualize complex shapes that are mathematically simple, like this one:

Complex but mathematically simple shape

So it would seem that the brain uses some other format. To the best knowledge of modern cognitive, how does this work?

$\endgroup$
7
  • $\begingroup$ Our visual image is in 2D, similar to a normal camera, but there are many factors that let us interpret the world as 3D. These are, among others, size of the object(objects farther are usually smaller), ordering of object (some object being behind something else) and relative movement (objects closer by move faster in the 2D pane than objects farther away). Given our many specialized visual areas (V1-V6 etc), this in most cases happens unconsciously. P.s. The fact that you dont use stick figures consciously, does not mean your brain doesn't. $\endgroup$ Commented Jun 13, 2016 at 21:07
  • $\begingroup$ @RobinKramer I'm not really asking about vision, the question is about how the brain holds visual data. Picture your best friend. You can probably spin him or her around, raise or lower arms, legs and head, and otherwise construct movies in your head with this person. That kind of grasp of 3D objects is difficult for computers to achieve, and yet we (and likely at least a few mammals) can do it naturally, effortlessly, and effectively instantly. Without any need to go into the electrochemical process by which objects are stored (unless you really want to), in what "format" is this data? $\endgroup$ Commented Jun 13, 2016 at 21:20
  • $\begingroup$ Just because you think you are mentally simulating a 3d object doesn't mean that the object is actually 3d in your head. That would require a large amount of perceptual information. Typically speaking the majority of information stored is based on global precepts and local precepts allowing for more specific identification. ( global perceptive bias) So given that the visual information you're manipulating is not structurally accessible (as in drawing) you are manipulating the precepts that you have previously stored (as required to identify the object) $\endgroup$
    – Dog
    Commented Jun 14, 2016 at 13:15
  • $\begingroup$ @Dog the notion that visual data is mentally 2D but interpreted using rules accustomed to 3D would seem contrary to simple experience. It's very easy to picture an object and describe it's shape in three dimensions, but describing it's shape in two dimensions from any particular angle is difficult (think of a car, easy to visualize in three dimensions, hard to outline in two). I would have to say I disagree $\endgroup$ Commented Jun 14, 2016 at 18:52
  • 1
    $\begingroup$ Nice question! I think the speculation and mystery of how we do this is a good example of why human vision/imagery is such a hard problem for cognitive science. A couple of points: (1) Introspection is not great evidence for how this happens in the brain. It doesn't feel pixel/shape based to you, but these mechanisms are probably not open to conscious introspection. (2) The type of "format"/representation is a key question in models of object recognition, so you might like to look at work by Tarr and Biederman and the debate between "view dependant" and "view independent" models. $\endgroup$
    – splint
    Commented Jun 15, 2016 at 10:56

1 Answer 1

6
$\begingroup$

This question cannot be answered in the form in which you asked it both because of the limits of current neuroscientific theories and methodologies when it comes to determining the structures of complex neural representations (although we have made headway in a few cases such as place cells and grid cells), and because neural representations are not really analogous to our colloquial concepts of simple mappings from numbers to images (ex. pixel, vector, wavelet, etc. bases for digital image representation).

I would encourage you to think, not only in terms of these representational formats privileged by Von Neumann style computer architectures (i.e. normal computers) with separate processing and digital memory systems, but in terms of a neural computer with distributed computation and memory. Neuroscientists typically think of neural representation in visual perception as occurring at a number of stages of increased abstraction. At the retina, the image is represented roughly in terms of what you might think of as pixels, but it is really just a set of photosensors and attached neurons that are activated due to light hitting the retina in different places at different frequencies and amplitudes. In V1 or primary visual cortex, the representation is in terms of what can be thought of as "edge detectors." In V2, the cells are tuned for a variety of slightly more complex properties such as such as orientation, spatial frequency, color, and binocular disparity (an important piece of information for 3D perception.) As you go up through V3, V4, V5, and V6, the representations becomes more and more complex until they contain information about the structure of the underlying concept itself.

The 3-dimensionality does not come in until the visual information from both eyes is combined with top-down information about how things tend to be in the world (which comes from other senses and experiences as well) in order to infer the likely shape of the object. It is unclear whether there is any place in the brain where an image is represented explicitly as a 3D model of a visual object. More likely the raw perceptual data such as colors and patterns is represented in one area and is associated with the more object-centric structural representation in another area and this association causes the co-activation of all the neurons relevant to the perception of the object in it's three dimensional form. The information is neither represented in one area of the brain nor in one scheme but is distributed across a large number of neurons with different degrees of activation.

We can also think of this in terms of the contemporary theory of enactive perception, which states that the brain represents information in terms of state-action-observation contingencies. In this theory, there is no "image" present at all, but only a set of neural activations which correlate with prior experience (where the correlations are stored in terms of associative links between neurons creating a causal activation path along which "information" travels) and relationships between potential actions (either low level motor actions of the retina or other muscles, or higher level actions such as taking a step forward) and their likely consequences (both low level perceptual consequences such as the expected changes in the sensory data when an action such as moving the head is taken and higher level perceptual and conceptual consequences such as a cup falling when you choose to open your hand and the associated experiences.) In this more comprehensive picture, visual information cannot be strictly separated from from conceptual and multi-sensory information and so three-dimensional visual information is represented both in terms of neural activations causally associated with the structure of the perceived object as well as the activations associated with all the other conceptual beliefs and perceptual expectations that relate.

If you want to read more, these sources should be a good start: distributed neural memory, distributed neural representation of a higher order process, perception as associations between actions and observations

There are also a few sources that offer direct experimental evidence towards your question, but do not provide as much of a conceptual introduction to neural computation and representation. See "Neural computations underlying depth perception", Binocular depth perception and the cerebral cortex as a start.

Your question also makes reference to rotating an object "in the mind's eye." There is also a very large body of research on this phenomenon, which is called "mental rotation." The wikipedia article is a good place to start but there is a lot more to say about this interesting phenomenon, so feel free to ask a related question if you are curious!

$\endgroup$
3
  • $\begingroup$ This is a fantastic answer, and I especially like the enactive perception interpretation, which would imply, for example, that the visual data attached to a human being is a combination of 2D data, 3D data, and general information about the shapes of humans in general, which can be used to build a cued rough (and yet quite accurate) mental model of structures that aren't stored in their entirety in 3D visuals as one would naïvely think. The links are also quite excellent. Feel free to add more! $\endgroup$ Commented Jun 20, 2016 at 23:51
  • $\begingroup$ Thank you! I appreciate the feedback and I'm glad to be able to contribute to your thoughts about this. I may be writing a blog post of this sort in the near future. I'll link you if and when I do. $\endgroup$ Commented Jun 22, 2016 at 21:14
  • $\begingroup$ It seems I'm still waiting on the science to catch up :) Did you ever write that blog post? $\endgroup$ Commented Oct 21, 2020 at 20:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.