Advanced Analysis on Temporal Data

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Due to the increase in CPU power and the ever increasing data storage capabilities, more and more data of all kind is recorded, including temporal data. Time series, the most prevalent type of temporal data are derived in a broad number of application domains. Prominent examples include stock price data in economy, gene expression data in biology, the course of environmental parameters in meteorology, or data of moving objects recorded by traffic sensors. This large amount of raw data can only be analyzed by automated data mining algorithms in order to generate new knowledge. One of the most basic data mining operations is the similarity query, which computes a similarity or distance value for two objects. Two aspects of such an similarity function are of special interest. First, the semantics of a similarity function and second, the computational cost for the calculation of a similarity value. The semantics is the actual similarity notion and is highly dependant on the analysis task at hand. This thesis addresses both aspects. We introduce a number of new similarity measures for time series data and show how they can efficiently be calculated by means of index structures and query algorithms. The first of the new similarity measures is threshold-based. Two time series are considered as similar, if they exceed a user-given threshold during similar time intervals. Aside from formally defining this similarity measure, we show how to represent time series in such a way that threshold-based queries can be efficiently calculated. Our representation allows for the specification of the threshold value at query time. This is for example useful for data mining task that try to determine crucial thresholds. The next similarity measure considers a relevant amplitude range. This range is scanned with a certain resolution and for each considered amplitude value features are extracted. We consider the change in the feature values over the amplitude values and thus, generate so-called feature sequences. Different features can finally be combined to answer amplitude-level-based similarity queries. In contrast to traditional approaches which aggregate global feature values along the time dimension, we capture local characteristics and monitor their change for different amplitude values. Furthermore, our method enables the user to specify a relevant range of amplitude values to be considered and so the similarity notion can be adapted to the current requirements. Next, we introduce so-called interval-focused similarity queries. A user can specify one or several time intervals that should be considered for the calculation of the similarity value. Our main focus for this similarity measure was the efficient support of the corresponding query. In particular we try to avoid loading the complete time series objects into main memory, if only a relatively small portion of a time series is of interest. We propose a time series representation which can be used to calculate upper and lower distance bounds, so that only a few time series objects have to be completely loaded and refined. Again, the relevant time intervals do not have to be known in advance. Finally, we define a similarity measure for so-called uncertain time series, where several amplitude values are given for each point in time. This can be due to multiple recordings or to errors in measurements, so that no exact value can be specified. We show how to efficiently support queries on uncertain time series. The last part of this thesis shows how data mining methods can be used to discover crucial threshold parameters for the threshold-based similarity measure. Furthermore we present a data mining tool for time series.

Time Series, Similarity, Data Mining

Aßfalg, Johannes

14. Jul. 2008

2008

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-87985

Aßfalg, Johannes (2008): Advanced Analysis on Temporal Data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Assfalg_Johannes.pdf
3MB

DOI: 10.5282/edoc.8798

URN: urn:nbn:de:bvb:19-87985

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Time Series, Similarity, Data Mining
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik 500 Naturwissenschaften und Mathematik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	14. Juli 2008
1. Berichterstatter:in:	Kriegel, Hans-Peter
MD5 Prüfsumme der PDF-Datei:	51e9599fc4b247865f3104dafea62f4e
Signatur der gedruckten Ausgabe:	0001/UMC 17127
ID Code:	8798
Eingestellt am:	29. Jul. 2008 07:03
Letzte Änderungen:	24. Oct. 2020 07:13

Nur für Administratoren und Editoren: Dokument bearbeiten