Logo Logo
Switch language to English
Gschwind, Markus (2007): Untersuchungen zur Kontextabhängigkeit der visuellen Objekterkennung. Dissertation, LMU München: Medizinische Fakultät



Recognizing a 3D object from 2D images seems easy and occurs everyday, but it is in principle an ill-defined problem. Here we report two behavioural experiments investigating how representations of 3D objects are built under varying the conditions of the recognition context, i.e. the set of test objects, the prior object knowledge and the degree of stimulus input information. We used a classification paradigm with initially unknown, “structure only” classification criteria, consisting of a category learning part and a subsequent generalization test. The acquisition of prior object knowledge was varied with an intra-modal (visual) group and a trans-modal (blindfold motor) group. Results show that subjects are able to build structure-based object representations from desultory 2D images. We observe a clear context-dependency, i.e. the task was much more difficult without prior object knowledge (control group) and for images with poor stimulus input information. If, in the 2D image, alignment cues are only sparsely available and thus the 3D object structure is ambiguous, this lack of visual information can be trans-modally compensated by motor prior knowledge which seems to be transferred in real-time to the view-dependent representations. The results further suggest the existence of several steps in the recognition process: 1. Image understanding with generation of a hypothetical object model. 2. Transformation and rotation in order to obtain a matching with the image input signal (matching-to-fit-cycle). 3. If step 2 is unsuccessful (e.g. mirror-symmetric objects), the hypothetical object model is referred to an external reference. These three steps are run through until the object is recognized successfully (non-exhaustive search). Within this three-step process the different, so far contradictory theories of object recognition can be integrated, in that they seem to describe a portion of the recognition process each.