From spatio-temporal trajectories to succinct and semantically meaningful patterns

Bermingham, Luke Leslie (2018) From spatio-temporal trajectories to succinct and semantically meaningful patterns. PhD thesis, James Cook University.

[img]
Preview
PDF (Thesis)
Download (2MB) | Preview
View at Publisher Website: https://doi.org/10.4225/28/5afa1141b90e5
 
388


Abstract

It is now possible to track moving entities such as humans, animals, or vehicles at relatively high sampling-rates, over long durations of time. This produces large, detailed spatio-temporal trajectories that contain millions of geographic positions and timestamps. These large spatio-temporal trajectories capture the potentially-interesting behaviours of individual entities; they are prime candidates for data mining and knowledge discovery. However, within the trajectory data mining and knowledge discovery process I have identified four challenges that hinder the discovery of succinct and semantically meaningful trajectory patterns: spatial uncertainty, trajectory complexity, pattern complexity, and semantic meaning.

The first challenge, spatial uncertainty, is present in many GPS trajectories. As the global positioning technology used to record trajectories is not entirely accurate, it produces noisy recordings for various reasons: antenna quality, satellite availability, and multi-path errors, inclusively. This extra noisiness in the data makes trajectories more difficult to efficiently mine; it increases the likelihood of discovering false patterns, also, potentially, masking real patterns.

The second challenge, trajectory complexity, refers to the large size and redundancy that is now typical due to the high sampling-rate and long-duration of real-world trajectory data collection. High sampling-rates and long durations are both effective techniques to increase the likelihood of capturing more patterns. I expect that trajectories will be sampled at increasingly higher rates, over longer durations, especially due to the low cost of storage and advances in battery technology. Decreasing the sampling-rate or duration is not an ideal solution because it arbitrarily reduces the amount of information captured. Unsimplified, however, these large trajectories do significantly slow down and complicate the mining process; as we mine increasingly larger trajectory datasets a solution is increasingly important.

Pattern complexity is the third challenge, and occurs because many existing trajectory data mining approaches produce pattern outputs that are not succinct or easily interpretable. In fact, the pattern outputs are sometimes so large they can overload the user and make knowledge discovery overly time consuming. Typically, to make sense of such trajectory patterns a human operator is required to use their expertise to further filter the results or manually select key patterns to represent supposed general trends. Ideally, mined trajectory patterns should be succinct enough that any post-processing by a human operator would needlessly discard information: if additional processing is required to aid pattern interpretation, that processing is far better suited as a step in the mining process.

Semantic meaning refers to the inherent absence of any contextual information present in raw spatio-temporal trajectories; it is the fourth challenge. Typically, raw spatio-temporal trajectories only record the geographic location with a time-stamp. Mining these raw spatio-temporal trajectories alone, limits the type of discoverable patterns. To uncover knowledge beyond pure movement patterns, extra contextual information that is semantically meaningful, in the application domain, is required. Each type of semantic information that could be inferred or combined with trajectories, presents a unique challenge.

Overall, the aim of this thesis is to investigate solutions to these four challenges to ultimately produce succinct and semantically meaningful trajectory patterns. To do so I divided the thesis into four parts and in each part I addressed some combination of these four challenges at various stages in the trajectory data mining and knowledge discovery process. In the first, I investigated a pre-processing solution that addresses the challenge of trajectory complexity. Specifically, I introduced a framework to create spatio-temporal trajectory simplification approaches. Using this framework I created several spatio-temporal simplification algorithms based on well-known poly-line simplification techniques, evaluating them using multiple real-world trajectory datasets. The results indicated that a number of the simplification algorithms produced were both efficient and effective at reducing the trajectory complexity of the tested real-world trajectories.

Second, I investigated a sequential pattern mining approach that addresses all four challenges in the context of vehicle trajectories. They were chosen for this section because they are simpler to process: they are constrained, in that they only travel underlying road networks. In this approach, I map-matched several real-world vehicle trajectory datasets onto road networks, thus removing their spatial uncertainty, reducing their complexity, and transforming them into a series of semantically meaningful street names. When using traditional sequential pattern mining approaches, mining these large, yet highly redundant sequences produced far too many patterns; meaningful interpretation became impossible. Thus, to overcome the challenge of pattern complexity I mined the sequences using an algorithm I created called DC-SPAN. DC-SPAN mines a highly succinct, but lossy, set of contiguous patterns where the user can control the sub-sequence redundancy of the pattern output. Experiment results on real-world bus trajectories showed that compared to existing contiguous sequential pattern mining approaches DC-SPAN was able to achieve as much a 98% compression in its pattern output while trading off only a 20% increase in lossiness. The results of this section were promising, but ultimately, largely specific to the vehicle trajectory domain.

In the third part I introduced a pre-processing approach called POSMIT. POSMIT annotates each spatio-temporal entry, in a trajectory, with a semantic label indicating whether the entity was stopping or moving within that recording. This semantic stop/move label is a step towards the challenge of enriching raw trajectories with semantic meaning because it can be used to infer further semantic information later in the trajectory data mining process. For example, an extended subsequence of stopping entries occurring inside a restaurant may indicate that the tracked entity was dining. Existing stop/move classification approaches are based on geographic and clustering-based concepts. These approaches definitively label each entry as a stop or a move, meaning that the accuracy of the resulting classification is strongly linked to the user's ability to estimate the required parameters. Conversely, POSMIT computes the probability that a given entry is stopping; then, stopping entries with probabilities below a user-specified threshold are filtered out and become moves. Unlike existing approaches, this feature of POSMIT allows users to tend the result towards having less false-positive stop classifications, which is important in applications like data mining where false-positives can lead to false patterns. The experiment results on real-world ground-truth stop/move annotated trajectories revealed that, compared to the existing approaches that were tested, POSMIT achieved a higher classification accuracy while also being more robust in parameter selection.

Finally, I used several concepts from the previous sections, both directly and indirectly, and introduced STOSEM: an overall semantic trajectory data mining approach that addresses all four of the identified challenges. STOSEM begins by using POSMIT to enrich each trajectory with stop/move information. Then, STOSEM proceeds to cluster these stop/move annotated trajectories into sequences of extended stop episodes. Clustering the individual recordings into stop episodes greatly reduces the raw trajectory complexity, neatly handling the problem of spatial uncertainty by representing many stopping entries within a single stop radius. After the stop episode formulation STOSEM then incorporates a repository of real-world places. This addresses the challenge of semantic meaning: all nearby real-world places are associated with relevant stop episodes to build a list of candidate places where the stop may have actually occurred. STOSEM, then, proceeds to match each sequence of stop episodes to its likely sequence of visited real-world places, using a probabilistic place-matching algorithm. In general, STOSEM's steps, as described thus far, essentially transform each raw trajectory into a discrete, succinct, and human readable series of place visitations that describe the journey of each tracked individual. This transformation makes data mining simple because this sequence of places is an ideal input for traditional frequent itemset and sequential pattern mining algorithms while remaining succinct enough to avoid the issue of pattern complexity.

To evaluate the efficiency and effectiveness of STOSEM I performed quantitative experiments using real-world and synthetic datasets. Some key findings from the quantitative experiments include: that the stop episode clustering is an effective simplification technique that preserves semantic meaning; that the proposed probabilistic place-matching approach was more accurate than the non-probabilistic approaches I tested. Additionally, to empirically test the applicability of STOSEM I performed a case study using a real-world human trajectory dataset called Geolife. The case study revealed that mining these place visitations, using traditional frequent itemset and sequential pattern mining algorithms, produced succinct and semantic trajectory pattern output. A key finding from the patterns was the existence of certain kinds of participants, such as University attendees and Microsoft employees: both specific participant demographics reported by the original Geolife researchers. Another key finding was the observed behaviours of certain kinds of participants, based on their visitations. For example, some participants seemingly revealed their University attendance timetables through their collected trajectories. These kinds of results, extracted neatly from raw spatio-temporal trajectories are very encouraging and motivate us to extend this approach to future works. This technique could also uncover move episodes between extended stops; they could then be classified into the likely mode of transport being used (i.e. car, bus, walking etc.).

Overall, this study produced several approaches that addressed the challenges of spatial uncertainty, trajectory complexity, pattern complexity, and semantic meaning across multiple real-world trajectory data mining problems. In particular, my cumulative work on STOSEM addresses all challenges, producing succinct and semantically meaningful patterns for real-world human trajectory datasets.

Item ID: 53636
Item Type: Thesis (PhD)
Keywords: probability, stops and moves, trajectories, spatio-temporal trajectories, semantic patterns, data mining, spatial uncertainty, trajectory complexity, pattern complexity, semantic meaning
Related URLs:
Additional Information:

Publications arising from this thesis are available from the Related URLs field. The publications are:

Chapter 3: Bermingham, Luke, and Lee, Ickjai (2017) A framework of spatio-temporal trajectory simplification methods. International Journal of Geographic Information Science, 31 (6). pp. 1128-1153.

Date Deposited: 14 May 2018 23:41
FoR Codes: 08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining @ 100%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890202 Application Tools and System Utilities @ 100%
Downloads: Total: 388
Last 12 Months: 30
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page