Representation learning for domain adaptation and cross-modal retrieval

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Representation learning for domain adaptation and cross-modal retrieval. in the context of online handwriting recognition and visual self-localization

Most machine learning applications involve a domain shift between data on which a model has initially been trained and data from a similar but different domain to which the model is later applied on. Applications range from human computer interaction (e.g., humans with different characteristics for speech or handwriting recognition), computer vision (e.g., a change of weather conditions or objects in the environment for visual self-localization), and neural language processing (e.g., switching between different languages). Another related field is cross-modal retrieval, which aims to efficiently extract information from various modalities. In this field, the data can exhibit variations between each modality. Such variations in data between the modalities can negatively impact the performance of the model. To reduce the impact of domain shift, methods search for an optimal transformation from the source to the target domain or an optimal alignment of modalities to learn a domain-invariant representation that is not affected by domain differences. The alignment of features of various data sources that are affected by domain shift requires representation learning techniques. These techniques are used to learn a meaningful representation that can be interpreted, or that includes latent features through the use of deep metric learning (DML). DML minimizes the distance between features by using the standard Euclidean loss, maximizes the similarity of features through cross correlation, or decreases the discrepancy of higher-order statistics like the maximum mean discrepancy. A similar but distinct field is pairwise learning and contrastive learning, which also employs DML. Contrastive learning not only aligns the features of data input pairs that have the same class label, but also increases the distance between pairs that have similar but different labels, thus enhancing the training process. This research presents techniques for domain adaptation and cross-modal retrieval that specifically focus on the following two applications. (1) Online handwriting recognition involves representing written characters as multivariate time-series data from sensor-enhanced pens and aims to classify the written text. We recorded and evaluated various datasets for single character and sequence-to-sequence classification, and made them publicly available. We evaluated the domain shift that can occur between right- and left-handed writers, as well as between different writing styles, using uncertainty quantification techniques. Our approach utilizes higher-order statistics or optimal transport to adjust the features between right- and left-handed writers in order to minimize this domain shift. The best transformation is selected using DML techniques. Additionally, we assess the effectiveness of contrastive learning and DML for adapting the domain between writing on tablet and on paper, as well as for cross-modal retrieval in offline and online handwriting recognition. (2) Visual self-localization aims to determine the absolute and relative position and orientation of a human or robot using only one monocular camera. We propose to enhance the task of predicting the absolute pose by incorporating an auxiliary task of predicting the relative pose using optical flow during the learning process and to pre-train on simulated data. In addition, we evaluate different fusion methods that utilize representation learning to combine information from visual and inertial sensors.

representation learning, deep metric learning, domain adaptation, cross-modal retrieval, multi-modal fusion, time-series classification, online handwriting recognition, visual self-localization, pose regression

Ott, Felix

18. Jul. 2023

2023

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-324674

Ott, Felix (2023): Representation learning for domain adaptation and cross-modal retrieval: in the context of online handwriting recognition and visual self-localization. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Ott_Felix.pdf
116MB

DOI: 10.5282/edoc.32467

URN: urn:nbn:de:bvb:19-324674

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	representation learning, deep metric learning, domain adaptation, cross-modal retrieval, multi-modal fusion, time-series classification, online handwriting recognition, visual self-localization, pose regression
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	18. Juli 2023
1. Berichterstatter:in:	Bischl, Bernd
MD5 Prüfsumme der PDF-Datei:	7fe72e0be5834326cf599fbb8ab229ed
Signatur der gedruckten Ausgabe:	0001/UMC 29911
ID Code:	32467
Eingestellt am:	28. Sep. 2023 12:13
Letzte Änderungen:	04. Oct. 2023 10:29

Nur für Administratoren und Editoren: Dokument bearbeiten