On the importance of symbol grounding and top-down processes in computer vision

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

In the past decade, feedforward artificial neural networks have stormed the field of artificial intelligence and shown impressive results in many domains. Nevertheless, one of the challenges in artificial intelligence is connecting the differentiable feature space in deep learning to the rich world of object-based, symbolic knowledge. For example, in computer vision, images consist of different features, such as edges and curves at a lower level, while at a higher level, they include objects and relations. Even though it is not feasible to describe the low-level features using the natural language, the attributes and relations between objects can be represented by symbols and are well-documented throughout human literature. Therefore, developing novel and effective architectures that can learn and utilize symbolic knowledge within the differentiable deep learning framework is essential. To this end, in this dissertation we argue for methods that map symbols to image-grounded representations such that they share the same representation space as images. Furthermore, we discuss the key role of top-down processes in utilizing object-level knowledge; top-down signals have been shown to play a significant role in the human brain for overcoming challenges such as occlusion. For example, even though there might not be enough pixels from a truck's wheel in an image, after detecting the truck itself within the top layers of a neural network, we can use the higher-level knowledge to recognize a small area in a corner that corresponds to the wheel. Nevertheless, current feedforward neural networks lack effective inductive biases for top-down processing. We show that grounding symbols in images and employing top-down mechanisms not only improves the scene understanding but also allows us to benefit from the massive pool of human-written symbolic knowledge in addition to image annotations. In summary, this dissertation introduces significant advances in the artificial intelligence domain, particularly computer vision and modeling commonsense. We propose models that utilize (1) structured knowledge, (2) unstructured text, and (3) 3d information to improve scene understanding, and through large-scale experiments, we show that our models significantly improve state-of-the-art results.

Visual Language Models, Scene Graph Classification, Symbol Grounding, Schemata, Piaget, Computer Vision

Sharifzadehgolpayegani, Sahand

07. Feb. 2023

2023

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-331769

Sharifzadehgolpayegani, Sahand (2023): On the importance of symbol grounding and top-down processes in computer vision. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

[thumbnail of Sharifzadehgolpayegani_Sahand.pdf]

Vorschau

PDF
Sharifzadehgolpayegani_Sahand.pdf
9MB

DOI: 10.5282/edoc.33176

URN: urn:nbn:de:bvb:19-331769

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Visual Language Models, Scene Graph Classification, Symbol Grounding, Schemata, Piaget, Computer Vision
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	7. Februar 2023
1. Berichterstatter:in:	Tresp, Volker
MD5 Prüfsumme der PDF-Datei:	48927f5eccb07b703671dfde6432d162
Signatur der gedruckten Ausgabe:	0001/UMC 30217
ID Code:	33176
Eingestellt am:	27. Feb. 2024 15:05
Letzte Änderungen:	28. Feb. 2024 14:41

Nur für Administratoren und Editoren: Dokument bearbeiten