Logo Logo
Help
Contact
Switch language to German
Representation learning for domain adaptation and cross-modal retrieval. in the context of online handwriting recognition and visual self-localization
Representation learning for domain adaptation and cross-modal retrieval. in the context of online handwriting recognition and visual self-localization
Most machine learning applications involve a domain shift between data on which a model has initially been trained and data from a similar but different domain to which the model is later applied on. Applications range from human computer interaction (e.g., humans with different characteristics for speech or handwriting recognition), computer vision (e.g., a change of weather conditions or objects in the environment for visual self-localization), and neural language processing (e.g., switching between different languages). Another related field is cross-modal retrieval, which aims to efficiently extract information from various modalities. In this field, the data can exhibit variations between each modality. Such variations in data between the modalities can negatively impact the performance of the model. To reduce the impact of domain shift, methods search for an optimal transformation from the source to the target domain or an optimal alignment of modalities to learn a domain-invariant representation that is not affected by domain differences. The alignment of features of various data sources that are affected by domain shift requires representation learning techniques. These techniques are used to learn a meaningful representation that can be interpreted, or that includes latent features through the use of deep metric learning (DML). DML minimizes the distance between features by using the standard Euclidean loss, maximizes the similarity of features through cross correlation, or decreases the discrepancy of higher-order statistics like the maximum mean discrepancy. A similar but distinct field is pairwise learning and contrastive learning, which also employs DML. Contrastive learning not only aligns the features of data input pairs that have the same class label, but also increases the distance between pairs that have similar but different labels, thus enhancing the training process. This research presents techniques for domain adaptation and cross-modal retrieval that specifically focus on the following two applications. (1) Online handwriting recognition involves representing written characters as multivariate time-series data from sensor-enhanced pens and aims to classify the written text. We recorded and evaluated various datasets for single character and sequence-to-sequence classification, and made them publicly available. We evaluated the domain shift that can occur between right- and left-handed writers, as well as between different writing styles, using uncertainty quantification techniques. Our approach utilizes higher-order statistics or optimal transport to adjust the features between right- and left-handed writers in order to minimize this domain shift. The best transformation is selected using DML techniques. Additionally, we assess the effectiveness of contrastive learning and DML for adapting the domain between writing on tablet and on paper, as well as for cross-modal retrieval in offline and online handwriting recognition. (2) Visual self-localization aims to determine the absolute and relative position and orientation of a human or robot using only one monocular camera. We propose to enhance the task of predicting the absolute pose by incorporating an auxiliary task of predicting the relative pose using optical flow during the learning process and to pre-train on simulated data. In addition, we evaluate different fusion methods that utilize representation learning to combine information from visual and inertial sensors.
representation learning, deep metric learning, domain adaptation, cross-modal retrieval, multi-modal fusion, time-series classification, online handwriting recognition, visual self-localization, pose regression
Ott, Felix
2023
English
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Ott, Felix (2023): Representation learning for domain adaptation and cross-modal retrieval: in the context of online handwriting recognition and visual self-localization. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics
[thumbnail of Ott_Felix.pdf]
Preview
PDF
Ott_Felix.pdf

116MB

Abstract

Most machine learning applications involve a domain shift between data on which a model has initially been trained and data from a similar but different domain to which the model is later applied on. Applications range from human computer interaction (e.g., humans with different characteristics for speech or handwriting recognition), computer vision (e.g., a change of weather conditions or objects in the environment for visual self-localization), and neural language processing (e.g., switching between different languages). Another related field is cross-modal retrieval, which aims to efficiently extract information from various modalities. In this field, the data can exhibit variations between each modality. Such variations in data between the modalities can negatively impact the performance of the model. To reduce the impact of domain shift, methods search for an optimal transformation from the source to the target domain or an optimal alignment of modalities to learn a domain-invariant representation that is not affected by domain differences. The alignment of features of various data sources that are affected by domain shift requires representation learning techniques. These techniques are used to learn a meaningful representation that can be interpreted, or that includes latent features through the use of deep metric learning (DML). DML minimizes the distance between features by using the standard Euclidean loss, maximizes the similarity of features through cross correlation, or decreases the discrepancy of higher-order statistics like the maximum mean discrepancy. A similar but distinct field is pairwise learning and contrastive learning, which also employs DML. Contrastive learning not only aligns the features of data input pairs that have the same class label, but also increases the distance between pairs that have similar but different labels, thus enhancing the training process. This research presents techniques for domain adaptation and cross-modal retrieval that specifically focus on the following two applications. (1) Online handwriting recognition involves representing written characters as multivariate time-series data from sensor-enhanced pens and aims to classify the written text. We recorded and evaluated various datasets for single character and sequence-to-sequence classification, and made them publicly available. We evaluated the domain shift that can occur between right- and left-handed writers, as well as between different writing styles, using uncertainty quantification techniques. Our approach utilizes higher-order statistics or optimal transport to adjust the features between right- and left-handed writers in order to minimize this domain shift. The best transformation is selected using DML techniques. Additionally, we assess the effectiveness of contrastive learning and DML for adapting the domain between writing on tablet and on paper, as well as for cross-modal retrieval in offline and online handwriting recognition. (2) Visual self-localization aims to determine the absolute and relative position and orientation of a human or robot using only one monocular camera. We propose to enhance the task of predicting the absolute pose by incorporating an auxiliary task of predicting the relative pose using optical flow during the learning process and to pre-train on simulated data. In addition, we evaluate different fusion methods that utilize representation learning to combine information from visual and inertial sensors.