De Droogh, Joachim
[UCL]
Lee, John Aldo
[UCL]
A lot of applications on smartphones and other wearables use a localization system to trace the users. We will be interested in giving meaning to GPS traces (latitude / longitude / timestamp) by detecting meaningful locations for a user from the traces. The purpose of this master thesis is to find out an algorithm and a set of input parameters that gives the most satisfactory results to get the meaningful locations from datasets. This master thesis was realized in collaboration with SONY. We build up a very first intuitive solution from scratch, without any help of specialized literature. We enlarge a public dataset with calculation of speeds between successive observations, to only conserve the observations with a speed lower than 2 km/h. We display observations on maps with the tools of Google, but the solution is limited to public datasets. We also try to get our first meaningful locations with Excel sheets by grouping other observations with a chosen radius; it could be a solution, but this one is not scalable : each observation has successively to be compared to all other ones. The concept of "density" will then be used. The unsupervised learning machine domain will help to determinate clusters from datasets with algorithms, by grouping data based on similarity. DBSCAN is such an algorithm that defines a cluster as a maximal set of density-connected points, based on the "-neighborhood and the minimum of points per cluster. OPTICS is a more advance version of DBSCAN : the density of a cluster can be different from one cluster to another. K-MEANS is another type of algorithm which creates K clusters to minimize a function : the squared Euclidean distance from a point to the mean of the points to its cluster. With OPTICS in combination with K-MEANS, we build a program to try the several combinations of the algorithms with parameters sets (input values) on the five datasets from SONY. We get the results sets and do some validations to only keep realistic results. We switch then into supervised learning by using the ground truths as input for the validations. We assign a score depending on time - percentage of common duration in the clusters of the ground truths - and distance criteria - proximity of the centers of the clusters with the ground truths. The final score is the multiplication of both percentages. Each center of cluster should be a meaningful location. The highest score is held by this combination : run OPTICS and then reapply OPTICS on the same data without the outliers. This highest score is assigned to only one dataset, the other datasets have a really lower score. We then apply the geometric mean on the results of the highest score by the user for each combination of algorithms. The algorithm is OPTICS followed by K-MEANS : the centers of the clusters are determined by K-MEANS, based on the number of clusters determined by OPTICS. The most appropriated parameters from proposed parameter set are also determined.


Référence bibliographique |
De Droogh, Joachim. Determining meaningful locations in user's life from raw data logged by smartphones and wearables. Ecole polytechnique de Louvain, Université catholique de Louvain, 2017. Prom. : Lee, John Aldo. |
Permalien |
http://hdl.handle.net/2078.1/thesis:12956 |