Literature Notes About Human Mobility

This is my note page for the papers about the human mobility

All contents only represent my own viewpoints. They might be right or wrong.

Transfer Urban Human Mobility via POI Embedding over Multiple Cities

ACM/IMS Transactions on Data Science, December 2020

Trajectory Prediction Embedding POIs LSTM and CNN

The basic task is to predict the next position given a trajectory sequence (prediction task). Fusing GPS and POI data, the paper proposed image-like embedding of POIs to represent a trajectory like an artificial video. This enables transfer learning to exploit other cities with limited observation data. An LSTM-on-CNNs architecture is designed to capture both spatio-temporal and geographical information.

One tricky and interesting point for me is the construction of POI-image. Different POI categories serve as different channels, while the value of a pixel is the number of POI of one category in one mesh (It means that all pixel values will not change along the time). Trajectories are used as capture frame (location meshes + window meshes) to generate the video clips. The main intuition is to mine sequential relationships of POIs. One potential problem in my opinion is the inefficient use of the expensive trajectory data.

City2City: Translating Place Representations across Cities


Trajectory Prediction Embedding LSTM Transfer Learning

The background is location prediction task given trajectory history in this paper. The place representation matrix is obtained from an LSTM-RNN using stay point sequences generated from GPS data. All three proposed models are based on this representation. The most preferred model is to train a transformation function to map representation in the common vector space. The author also develops another two models for comparison. One is simply merging mobility data from two cities and training representation together. Another is to train the adversarial model.

Due to the unsupervised learning setting, the paper uses the top-N visited place pairs to train the mapping function, which may be problematic in my opinion. Moreover, the representation matrix deal with embeddings of stay points, which may be expensive and difficult to learn an accurate mapping function.

CityCoupling: Bridging Intercity Human Mobility

Ubicomp 2016

Irregular Trajectory Prediction Transfer Learning

The paper utilizes an expectation-maximation (EM) algorithm to estimate intercity spatial mapping. The intercity trajectory matching is seen as latent variables and the intercity spatial mapping as the parameters. Then, a Gibbs sampling multiple hidden Markov model utilizes the Viterbi algorithm to generate simulated trajectories using the generated spatial mapping.

In my opinion, even though the idea of not using any prior knowledge is fascinating, simply matching trajectories and estimating optimal parameters can be computationally expensive and difficult to converge. And this process may neglect underlying patterns.

A Non-Parametric Generative Model for Human Trajectories

IJCAI 2018

Trajectory Simulation

The paper proposes a non-sequential non-parametric generative model to capture high-order geographic and semantic features of human mobility. It uses the matrix to represent one discretized location and each cell contains information about the time and duration of visiting that cell in the given trajectory. The third dimension of the matrix (tensor) is the repeated times of visiting the same location. The paper uses the generative adversarial networks to train the model.

The idea of non-parametric is very fascinating since it is difficult to find a comprehensive and sound approach to model the human mobility. The proposed method “compresses” the time dimension. However, it also sacrifices the ability of explaining the patterns in human knowledge.

DeepMove: Predicting Human Mobility with Attentional Recurrent Networks

WWW 2018

Trajectory Prediction Embedding Multimodal RNN and Attention

The task of human mobility is the prediction task. The paper constructs multimodal embeddings for current and historical trajectories. Then the embeddings are fed into a Recurrent Neural Network. A historical attention module between historical and current trajectories is used to capture multilevel periodicity. The paper proposes two different attention candidate generators: one is to directly embed records into independent latent vectors and sample for candidates, while another is to sequentially encode and take intermediate outputs as candidates. The workflow of architecture dealing with sequential data is typical. One interesting point is that the paper examines the attention matrix to explain the effectives.

However, in my opinion, the explanation is shallow since it only focuses on the maximum weight to prove it. More elaborations are needed in this part. Moreover, even though the author uses multimodal embeddings, the multimodal information is just location, user ID and time, which may not be so powerful. And conversion from one-hot vectors to dense vectors may be unable to capture complicated features.

Fine-Grained Urban Flow Prediction

WWW 2021

Flow Prediction Embedding POIs CNN and GCN Multimodal

The task of the paper is the prediction of urban flow based on historical data. The resolution is high (The grid size is ~150m), which is called as fine-grained. The author selects key timesteps as closeness, period and trend (recent, daily and weekly) to create different layers of the flow inputs, and uses non-shared convolutional layer to process them. The external factors (like weather and time) and POI information are processed by the meta learner. All four layers are concatenated and fed into a network to get high-dimensional embeddings for each grid. To reduce the parameters due to the fine-grained settings, the author develops region partition and uses a Graph Neural Network to consider global spatial dependencies and make predictions.

In all, it is a very good paper for the comprehensive use of mobility data and the consideration of the external factors as embeddings. The paper gives a very clear description of the architecture. One of problem in my opinion is the usage of fine-grained data, as it is expensive and easy to violate privacy. Moreover, the setting of GCN may be not convincing because the complete graph may not describe the actual connections between different regions.

Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting

IJCAI 2018

Traffic Prediction Graph and Convolution

This work focuses on the traffic prediction (attributes of the traffic flow like velocity). The paper constructs graphs directly based on connectivity of observation stations. An approximation is applied for layer-wise linear structure to reduce parameters. The author purely uses convolutional networks to build the model. The spatial features are attained by convoluting on graph structure data, while the temporal features are gathered by applying 1-D convolution layer along the time axis. The spatial and temporal layers are jointly process using “sandwich” structure.

The paper develops a general approach to process Spatio-temporal sequences. The math part of approximation is a little bit difficult for me. And the mining of graph structure a little bit too simple, as the author mainly focuses on balancing spatial and temporal layers.

Competitive Analysis for Points of Interest


Embedding POIs GNN Multimodal

The target is to study competitive relationships of POIs. The author summarizes POIs and construct graphs (which means generation of edges between POIs) in two aspects. First is the spatial relationship through heat maps. Second is the aspect relationship extracted from reviews and search histories.

Then these two graphs are fed into the network to predict competitive relationships through spatial and knowledge sides. For the spatial side, to consider two spatial factors, different location sectors and distance, the paper designs different convolution layers for different sectors and an attentive propagation layer by assigning distance into different buckets. Moreover, the competitive and compensative graphs are operated separately and concatenated in the end. For the knowledge side, the brand and aspect embeddings are learned using specific aggregation functions. Brands and Aspects are interacted by cross attention.

The interesting point of the paper is usage of multinomial data, including reviews, search histories and spatial relationship of POIs. How they convert POIs into graph structure is also a good point to consider different factors hierarchically. Even though the paper is not about mobility using POI information, it is a good hint for delineation of POI interactions.

Learning to Simulate Human Mobility


Trajectory Simulation Embedding GAN, CNN and Attention

The paper introduces a model-free generative adversarial framework to simulate human mobility. For the generator, it embeds time and location from one hot vector. Then it projects the dense representation vector into three vectors to enable self-attention. Three region-based matrices using physical distance, function similarity and historical transition are used to deal with intermediate output to select information. This information adds the intermediate vector and serves as the final result, just like the residual connection. Discriminator is convolution-neural-network based. It converts the trajectory to the 2D feature matrix and convolutes over the matrix. Besides, it uses the last version of generator as the assistant to complete the partial trajectory. The loss function cares the distance and regularity. As a result, the travel distance between transition is limited, and same visited locations at the same time are encouraged. The author also pretrains the model. Next location prediction is used to pretrain the generator, and the binary classification task whether trajectories exhibit regularities is used to pretrain the discriminator.

The paper balances the model-based and model-free methodology. It is of great importance to achieve the performance and explanation of model at the same time. For the mobility patterns, the paper considers the distance, POI distribution, transition and regularities, which may not be comprehensive to explain complicated patterns. The essential time factors (which mean the absolute time the activities happen) are also neglected. Unfortunately, the code has not been released, even though they said they would do two years ago.

A Review of Location Encoding for GeoAI: Methods and Applications

International Journal of Geographical Information Science, 2022


The paper reviews the location encoding, which is useful for many downstream tasks. Different encoding methods have different properties, such as distance preserving, direction aware, multi-scale, and parametric or not. The choice of encoding needs to consider tradeoff between bias (restricted hypothesis space) and variance (flexible and large hypothesis space). Based on different input data, encoders have different properties. The paper mainly reviews single point location encoders receive point coordinates and diversify in whether consider neighbors or not in the space. It also briefly introduces encoders for polyline, polygon and raster data.

In my opinion, even though the distance and direction are intuitive properties the location encoders should maintain, there is doubt that whether it is necessary as other factors such as high-speed transportation may reduce the influence of distance. However, there is no denying that the location encoders need to be unified under the same framework to enable the interpretability and transfer learning.

A Survey on Deep Learning for Human Mobility

ACM Computing Surveys, 2021


The paper divides mobility tasks into next-location/flow predictions and flow/trajectory generations. It defines the different problems, lists related datasets as well as evaluation metrics and summarizes diversified deep learning architecture from various papers. The paper states key points of different settings. For next-location predictions and trajectory predictions, which are both for agent-level, the individual spatial and temporal patterns, including regular and irregular mobility, the external factors and user preferences need to be maintained. For crowd flow prediction, spatial patterns including influence of near-by and distant areas, temporal patterns including periodic, trends, recent ones (defined by different period lengths) and external factors need to be considered. However, for flow generation, only spatial patterns and geographic characteristics (similar to user preference on individual level) are needed. In the end, the paper also concludes challenges for human mobility problems, which are geographic transferability, explainability, privacy, tunability and consideration of interactions.

This review paper clarifies the definition of human mobility problems and key points for the solutions, which is of great significance. It also throws light on the difficulties that future research needs to overcome, with which I completely agree.

Urban Computing: Concepts, Methodologies, and Applications



The paper gives a review of the urban computing. It introduces the concept of urban computing, categorizes the applications of urban computing into several groups, introduce the data form and structure used in this domain.

This paper is the ancestor of the urban computing.

Graph WaveNet for Deep Spatial-Temporal Graph Modeling

IJCAI 2019

Flow Prediction CNN and GCN

The paper proposes a graph convolution layer with a self-adaptive adjacency matrix. The output comes from three parts: the forward and backward (for directed graphs) diffusion process of graph signals and the one from the self-adaptive adjacency matrix. This matrix is calculated by multiplying source and target embeddings with ReLU activation and SoftMax normalization. Apart from that, it also adopts stacked dilated casual convolutions to capture long temporal dependencies, which is called as temporal convolution layer. The key is the exponentially growing receptive field. This part is also gated to control the information flow. Overall, the training purpose is the minimization of mean absolute error.

The self-adaptive matrix is impressive, which unifies the solution of problems with and without graph structures. It also solves the constraint of application of graph convolution network on dynamic graphs (as it needs the adjacency matrix). Nevertheless, I think the sum of three parts may not be able to summarize the correct answer. The weighted sum may be a better choice.

Representing Urban Functions through Zone Embedding with Human Mobility Patterns

IJCAI 2018

Land Use Prediction Embedding

Inspired by word2vec, the paper constructs zone embeddings from co-occurrences origin-destination zone, which are extracted from taxi trajectories. The writer uses the zone as a word and the corresponding departed/arrived zone-time event pair as the context. Moreover, the paper uses travel demand calculated from two factors, total arrival at destination zones and travel distance with the gravity model of transportation analysis, to assign weekday/weekend importance. The goal of the model is to minimize the importance-weighted difference between positive point-wise mutual information of co-occurrence and embedding decoded results. Regarding the evaluation, the paper utilizes k-means clustering to partition zone-set into urban function clusters. The prediction uses the proposed embedding vectors, while the validation employs vectors of percentage for different land use types in different zones.

The construction of co-occurrence pair is interesting. Division of the origin and destination to the event of one zone enables inclusion of timestamp and direction. The import of travel demand weight can make the embedding consider traffic-related factors. The usage of k-means cluster in evaluation part is also a brilliant idea to test embeddings. However, it might be not effective sometimes, as the percentage may not elucidate the actual function of one zone. In addition, some zone may have several functions with different importance, so the clustering will neglect other function with minor importance, rendering the result not comprehensive.

Improving Land Use Classification using Human Mobility-based Hierarchical Place Embeddings

PerCom 2021

Trajectory Prediction Embedding RNN

As the data sparsity may deteriorate the performance of location embeddings, the paper makes use of spatial hierarchical information. They use the LSTM-based RNN for next place prediction tasks and get the by-product embeddings for locations, which is in the region form. The main contribution is to design the dimension of the embedding and divide different dimensions to different size regions hierarchically. Finally, the paper uses the land use dataset for a prediction task to test the performance.

In my opinion, hierarchical spatial embedding is an interesting idea, as we can generate embedding for different levels at one time. The discussion of the dimension division is also valuable. However, the model structure is just a normal RNN and nothing special is done for embedding generation, which may lead to the confusion about the effectiveness. Also, the simple and direct division of the embedding may be doubtful as the hierarchical information tend to be more complicated, not applicable in the simple space.

Location Embeddings for Next Trip Recommendation

WWW 2019

Next Location Prediction Embedding GCN and Recommendation System Multimodal

The task of the paper is to recommend cities for travelers, which can be seen as a binary classification task. First, the model constructs knowledge graph and learns the representation of cities from Wikipedia (by TF-IDF weighted sum of word vectors) and location-based social network data (by TransE). Then, the paper combines two existing recommender systems and learns user behaviors from their booking histories that also include personal profiles. All parts above are combined as a deep component. Another component is a feature factorization machines component with contextual information. These two components are concatenated together to feed into a multilayer perceptron. In the data preprocessing, they use a trained classifier to select leisure trips and only keep travels who have destinations over a threshold in their history.

The structure of the paper is elegant. The writer implements several existing approaches to construct the model, so it may be necessary to read related papers to understand it. This work is mainly related to the recommend system and the research level is city so that the contextual information is easy to collect. However, for our human mobility problems, the resolution may not be enough and most information are not available.

STUaNet: Understanding Uncertainty in Spatiotemporal Collective Human Mobility

WWW 2021

Trajectory Prediction Uncertainty RNN, GCN and Recommendation System Multimodal

This paper focuses on variation for human mobility predictions and quantifies uncertainty of internal data quality and external contextual interactions. The model includes two parts: human mobility prediction tasks and uncertainty learning tasks. For one, mobility prediction is backboned by graph convolution module within an LSTM. For next-interval predictions, the paper considers three scales: nearest intervals, same intervals in most recent days, average mobility intensity of the same intervals in each day of the last week. They use gravity-based model to indicate transition patterns from urban mobility flow. Another module uses the same input to quantify uncertainties for internal spatiotemporal dependencies and external context influence. On the one hand, period-wise sequence similarities are computed to estimate data quality. On the other hand, factorization machine (for contextual interactions) and graph convolution network(for spatial dependency) are used to compute external factors and learn mapping functions from different contextual factors to region-level uncertainty. Moreover, the model actively injects controllable Gaussian noise for guidance, with two weak supervised indicators: data quality estimation results and variance. In addition, a gated-based bridge is used to reduce learned uncertainty from predictions.

The paper offers a special perspective for human mobility predictions. Uncertainty can make prediction results variate a lot under different data or settings. Thus, it is essential to consider it. The paper develops a seamless structure for both internal and external uncertainty estimations. Nonetheless, all calculators are neural networks and uncertainty labels are results from neural estimators and statistical variation, which may be not powerful and sometimes difficult to find an accurate results. More studies are needed for the ground-truth labels for uncertainty, in my opinion.

Origin-Destination Matrix Prediction via Graph Convolution: a New Perspective of Passenger Demand Modeling


OD/Flow Prediction LSTM and GCN Embedding

This paper predicts origin-destination matrix from taxi ride-hailing datasets. The authors aggregate features like graph message passing and put them into LSTM models. To solve the problem of data sparsity, which makes straightforward application of GCN infeasible, the paper considers geographical neighbors (from first law of geography) and semantic neighbors (from OD matrix). The grid will aggregate information of these two neighbors by concatenating two vectors. Instead of training embeddings, the paper trains the weight matrix in the aggregation function to select feature information. Moreover, before aggregation, they pre-weight two neighbors according to the distance and the total number of demands. The sequential grid vector representations are fed into a period-skip LSTM to incorporate temporal information of passenger demands. The main task of the paper is to predict the OD matrix using grid embeddings via a transition matrix. Based on the proposed grid embeddings, the paper also conducts two subtasks for prediction of number of incoming and outgoing demands in each grid at different time slots. The loss includes abovementioned three parts.

This paper proposes interesting concepts of geographical and semantics neighbors and use pre-weighted factors as a kind of prior knowledge to aggregate them. It is a great idea to introduce spatial features into a general graph. In my opinion, the main backbone of the model is mainly the LSTM, and the paper derives hidden states from message-like features. In addition, the subtasks somehow overlap the main task, as we can calculate the incoming and outgoing volumes from OD matrix.

Predicting Origin-Destination Flow via Multi-Perspective Graph Convolutional Network

ICDE 2020

OD/Flow Prediction Prediction LSTM and GCN Embedding

This paper uses an LSTM and multiple convolution graphs to predict OD matrix at the next time step. First, the paper treats each OD pair individually (Intra-Region). It extracts five historical values (one week ago, one day ago and the three most recent time slots) and feeds them into an LSTM to obtain the initial hidden vector for OD pairs. Besides, the author uses 24-hour traffic volumes averaged by different days and calculates cosine correlation between two origins and two destinations. The result is used as features. Then, three different graphs are constructed (Between-Region): one dynamic graph via historical OD flows and two static graphs including adjacency graph via geographic information and POI graph via POI information. It conducts GCN for both origins and destinations to obtain a 2D GCN and uses a linear regression to get predicted traffic matrix. Three graphs are separately trained and the prediction is the average of them.

Due to the length of the paper, the LSTM part is not quite clear. The idea of three graphs is interesting and reasonable. One possible disadvantage is that the grid size is large(~3km) and they only consider usually non-zero OD pairs. It may lose information if we neglect some zero OD pairs. Moreover, I am doubtful about the idea of more layers to consider more hops of neighbors as it will cause over-smoothing problems.

Learning Effective Road Network Representation with Hierarchical Graph Neural Networks


Hierarchical Representation Learning GNN Embedding

The author proposes a framework to model the road network at the road segments, structural regions and functional zones levels using GNN. The paper considers road segments as vertices, and uses road segment ID, road type, lane number, segment length and longitude with latitude as features. The features are discrete and continuous features are divided into consecutive bins. To form region nodes from road segments, the model first conducts hard binary mapping using spectral clustering: it runs standard K-means algorithms on eigenvectors of the graph Laplacian for road segment adjacent matrix. Moreover, a GAT is used to calculate segment importance scores for different clusters. The hard mapping and importance score are multiplied as weighted scores to determine the soft assignment matrix of road segments to structural regions. The weighted scores are both used to get the region representation and the adjacent matrix of region. The assignment matrix is trained by reconstruction of road network with the approximated segment representation. The process is similar for constructing functional zones from structural regions. The assignment matrix is the normalization of scores from another GAT. The zone representation is the linear combination of region representations. The weighted adjacency matrix for zones is computed from representation matrix. The difference is that real trajectory data are used for structure information. The trajectories are changed to the road segment transition matrix: the frequency one segment reaches another segment with a step length in all trajectory sequences. The updated connectivity matrix is the past adjacent matrix plus the transition matrix. The reconstruction task is similar, but the MSE loss is used to measure the difference. After that, the model fixes two assignment matrices and updates node representations. First, it performs zone level update via a standard GCN. Then it updates region level similarly via another GCN. Finally update segment level via a GAT (as the adjacency matrix is binary). As for the experiment, the road network is based on OpenStreetMap, and the trajectories from three cities are mapped to road segments from the GPS logs. The performance is analyzed from four traffic-related applications using simple standard neural network architectures: next-location prediction, label classification of road segments, destination prediction and route planning.

The paper proposed a novel framework for the urban hierarchical structure. The definition of structural regions and functional zones are well organized. The core part is the reconstruction task for training the needed assignment matrices. Different types of neural networks and loss functions are carefully chosen to fit the purpose. However, in my opinion, the feature selections for road segments may not be effective as features like ID and coordinates are not informative for the proposed tasks. In addition, the signal for reconstruction task may need more consideration. There is no doubt that it closely relates to the graph structure, but it is not clear whether the signal can reflect the success of the reconstruction task. It also remained unexplained in the calculation of weighted adjacency matrices from the representation matrices.

Learning Urban Region Representations with POIs and Hierarchical Graph Infomax

ISPRS 2023

Hierarchical Representation Learning GCN POIs Embedding

This paper proposes an unsupervised learning model for embeddings of POIs on POI-region-city hierarchies. The author adopts POI categories on second-level (e.g., supermarket, mall and grocery). First, graph on POI level is constructed using Delaunay triangulation. Each edge is assigned with an unnormalized weight depending on the spatial distance and whether within in the same region. The category embeddings are grasped using a random walk model to consider co-occurrence and distance decay. Laplacian Eigenmaps is also applied to enforce the second level categories with the same first-level to be the adjacent in the embedding space. Then, a one-layer GCN encoder is applied on category embeddings to generate unique single POI embeddings. Second, the region graph is constructed based on adjacency relations. The POI embeddings are aggregated to region level using multi-head attention inspired by set transformer. Then, region embeddings are passed to another one-layer GCN. Third, region embeddings are aggregated to the city level using area-weighted sum pooling with a sigmoid function. Maximization of mutual information is the essence for unsupervised learning in this paper. Negative sampling (contrastive learning) is performed at both POI and regional levels. For a region, maximize the mutual information between the region embedding and POI embedding within the region. The negative samples are POI embeddings in another region. Specifically, the paper uses a hard negative sampling. First, the author represents each POI as the concatenation of its first and second category and use average for a rough region embedding. The negative samples are selected from regions that have a certain cosine similarity with the target region rough embedding. For a city, region embeddings are positive samples, while the negative samples come from row-wise shuffling of the POI graph feature matrix (corrupted graphs). Accordingly, the loss function contains two parts: POI-region and region-city. As for the experiment, there are 22 first-level and ~170 second-level categories for POI. Dimension of the POI category embedding is set as 64. Traffic analysis zones from road network are seen as regions. Three downstream tasks are designed, which are predictions of urban function, population density and housing price. The regression task is analyzed by a random forest model using RMSE, MAE and correlation coefficient, while the classification task is analyzed by simple MLP using L1 distance, KL divergence and cosine similarity.

In all, the idea of the unsupervised learning is fascinating, as the quality of real-time data influences a lot in the supervised learning custom. The unified POI categorical information and unique single POI information are captured together in the proposed model. Moreover, embeddings crossing the level are dealt differently for POI-region and region-city. However, the hierarchical relation is considered by two disjointed graphs, which are connected by embeddings. It may fail to capture conjugate relationship if two parts are trained separately. In addition, the hard negative sample part and negative samples for region level may need more explanation to prove their effectiveness.

Last updated: March 28, 2023