Logo Logo
Help
Contact
Switch language to German
Enhanced Query Processing on Complex Spatial and Temporal Data
Enhanced Query Processing on Complex Spatial and Temporal Data
Innovative technologies in the area of multimedia and mechanical engineering as well as novel methods for data acquisition in different scientific subareas, including geo-science, environmental science, medicine, biology and astronomy, enable a more exact representation of the data, and thus, a more precise data analysis. The resulting quantitative and qualitative growth of specifically spatial and temporal data leads to new challenges for the management and processing of complex structured objects and requires the employment of efficient and effective methods for data analysis. Spatial data denote the description of objects in space by a well-defined extension, a specific location and by their relationships to the other objects. Classical representatives of complex structured spatial objects are three-dimensional CAD data from the sector "mechanical engineering" and two-dimensional bounded regions from the area "geography". For industrial applications, efficient collision and intersection queries are of great importance. Temporal data denote data describing time dependent processes, as for instance the duration of specific events or the description of time varying attributes of objects. Time series belong to one of the most popular and complex type of temporal data and are the most important form of description for time varying processes. An elementary type of query in time series databases is the similarity query which serves as basic query for data mining applications. The main target of this thesis is to develop an effective and efficient algorithm supporting collision queries on spatial data as well as similarity queries on temporal data, in particular, time series. The presented concepts are based on the efficient management of interval sequences which are suitable for spatial and temporal data. The effective analysis of the underlying objects will be efficiently supported by adequate access methods. First, this thesis deals with collision queries on complex spatial objects which can be reduced to intersection queries on interval sequences. We introduce statistical methods for the grouping of subsequences. Involving the concept of multi-step query processing, these methods enable the user to accelerate the query process drastically. Furthermore, in this thesis we will develop a cost model for the multi-step query process of interval sequences in distributed systems. The proposed approach successfully supports a cost based query strategy. Second, we introduce a novel similarity measure for time series. It allows the user to focus specific time series amplitudes for the similarity measurement. The new similarity model defines two time series to be similar iff they show similar temporal behavior w.r.t. being below or above a specific threshold. This type of query is primarily required in natural science applications. The main goal of this new query method is the detection of anomalies and the adaptation to new claims in the area of data mining in time series databases. In addition, a semi-supervised cluster analysis method will be presented which is based on the introduced similarity model for time series. The efficiency and effectiveness of the proposed techniques will be extensively discussed and the advantages against existing methods experimentally proofed by means of datasets derived from real-world applications.
query processing, database, spatial data, time series, high resolution
Renz, Matthias
2006
English
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Renz, Matthias (2006): Enhanced Query Processing on Complex Spatial and Temporal Data. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics
[thumbnail of renz_matthias.pdf]
Preview
PDF
renz_matthias.pdf

7MB

Abstract

Innovative technologies in the area of multimedia and mechanical engineering as well as novel methods for data acquisition in different scientific subareas, including geo-science, environmental science, medicine, biology and astronomy, enable a more exact representation of the data, and thus, a more precise data analysis. The resulting quantitative and qualitative growth of specifically spatial and temporal data leads to new challenges for the management and processing of complex structured objects and requires the employment of efficient and effective methods for data analysis. Spatial data denote the description of objects in space by a well-defined extension, a specific location and by their relationships to the other objects. Classical representatives of complex structured spatial objects are three-dimensional CAD data from the sector "mechanical engineering" and two-dimensional bounded regions from the area "geography". For industrial applications, efficient collision and intersection queries are of great importance. Temporal data denote data describing time dependent processes, as for instance the duration of specific events or the description of time varying attributes of objects. Time series belong to one of the most popular and complex type of temporal data and are the most important form of description for time varying processes. An elementary type of query in time series databases is the similarity query which serves as basic query for data mining applications. The main target of this thesis is to develop an effective and efficient algorithm supporting collision queries on spatial data as well as similarity queries on temporal data, in particular, time series. The presented concepts are based on the efficient management of interval sequences which are suitable for spatial and temporal data. The effective analysis of the underlying objects will be efficiently supported by adequate access methods. First, this thesis deals with collision queries on complex spatial objects which can be reduced to intersection queries on interval sequences. We introduce statistical methods for the grouping of subsequences. Involving the concept of multi-step query processing, these methods enable the user to accelerate the query process drastically. Furthermore, in this thesis we will develop a cost model for the multi-step query process of interval sequences in distributed systems. The proposed approach successfully supports a cost based query strategy. Second, we introduce a novel similarity measure for time series. It allows the user to focus specific time series amplitudes for the similarity measurement. The new similarity model defines two time series to be similar iff they show similar temporal behavior w.r.t. being below or above a specific threshold. This type of query is primarily required in natural science applications. The main goal of this new query method is the detection of anomalies and the adaptation to new claims in the area of data mining in time series databases. In addition, a semi-supervised cluster analysis method will be presented which is based on the introduced similarity model for time series. The efficiency and effectiveness of the proposed techniques will be extensively discussed and the advantages against existing methods experimentally proofed by means of datasets derived from real-world applications.