Spatio-Temporal Data Analysis using Deep Learning

14 min readMay 16, 2021

This article is based on the paper titled — “Deep Learning for Spatio-Temporal Data Mining: A Survey”. My goal here is to touch upon some of the aspects in the paper about how the data is, what are the techniques and some of the use-cases mentioned from a spatio-temporal data analysis perspective. I have stuck to the format of the paper to the best of my ability.

Spatio-Temporal Data arises in scenarios where data is collected across time and space. The ubiquity of spatio-temporal data today in unquestionable. The explosion of GPS devices, mobile phones with sensors and significant improvements in sensor technology has created multiple avenues for such data to be collected. Some examples of such data include traffic speed data, epidemiological data, mobility data etc.

What makes ST data different?

Despite this explosion, the amount of research is only catching up to it now. The delay is in part because ST data is not very conducive to traditional data-mining techniques because of a variety of reasons:

Classical datasets are discrete while ST data is embedded in a continuous space i.e., the data is captured at different points in space and time
Patterns of ST also exhibit correlations and are not independently generated

As an example consider the spread of a contagion through a country. There is an identifiable spatial correlation in the mortality rates for states that are closer to each other than that are farther away from each other. On the other hand there is also a temporal correlation imposed on it. The presence of such autocorrelation breaks the “Independent Samples” assumption that is foundational to traditional data mining techniques.

So how does deep learning tackle these issues?

Deep learning has been shown to be really good when it comes to non-linear data and this can be leveraged to mine ST data easily.

In particular, deep learning models can automatically learn hierarchical feature representations from raw ST data. The spatial proximity and the temporal correlations have been shown to be demonstrably captured by CNN and RNN. CNN have shown great accuracy in capturing features from image data while RNN have been used to model any data with a temporal dynamic behavior.

Another important aspect of deep learning is the layered function approximation capabilities. DNN are based on the idea that any unknown complex non-linear function can be learned as long as it has enough layers. As the data moves through the layers, higher abstractions of the underlying model can be discovered. This makes it conducive to more problems in ST data.

What are the different kinds of ST data?

ST data can be produced by different data sources in different forms. It becomes important to understand the kinds of data produced so as to eventually represent the data for any data mining tasks. ST Data can be broadly be divided into event data, point reference data, raster data and video data.

Event Data

Data collected at point location and point time are events. Disease Outbreaks, Crimes and Traffic Accidents are all examples of point data.

Trajectory Data

Trajectory data as the name suggests is the space occupied by a subject at a given time. It is the path of the subject. A good example of such data is senor data captured for moving bodies, traffic data and location based services.

Point Reference Data

Point reference data consists of the capture of a single field over a certain area in certain periods of time. An example of this would be humidity of different points in a state over a period of time. The key point to note is that it is a continuous and not a discrete variable.

Raster Data

Raster data is the measurement of a continuous or a discrete ST field in fixed locations at fixed periods of time. A good example of such data is the fMRI Images of brain captured over fixed time intervals to detect blood flow change.

Video Data

Video data exhibits both spatial correlation and temporal variance this is also a form of ST data.

How do you represent the data?

Before venturing into the architectural patterns for deep learning, it is important to understand how this data has to be represented.

A ST point is a tuple containing the spatial and temporal components along with additional information.

A time-series with the location specified is ideal to record say the trajectory data. If the same time-series captured the evolution of a variable in a fixed location it would be raster data. Alternately at a specific timestamp, the data for different locations i.e., a spatial map, is also raster data.

Eventually all this data can either be captured as a sequence, a 2-d Matrix or a 3-D Tensor. The relationship between the data types, the instances and eventually their representation would be as below:

What are the neural networks commonly used in solving STDM tasks?

While architecting a neural network is highly dependent on the nature of the data and the task at hand, a quick recap of some of the neural networks that are in use in STDM will help with understanding the solutions better.

Restricted Boltzmann Machines (RBM)

An RBM is a two layer stochastic neural network employed for dimensionality reduction, classification etc. RBM tries to learn the binary code or representation of the input. This makes it very conducive to learn the features.

Convolutional neural networks (CNN)

CNN is a class of deep, feed-forward artificial neural networks that are applied to analyze visual imagery.

The input layer which accepts the input, the convolutional layer which determines the output of the neurons which are a scalar product of their weights and the local input regions, the pooling layer which downsamples along the spatial dimensionality to reduce the parameters, the fullyconnected layer which connects the layers together and the output layer which produces the final output. CNN has wide applications in imagery, raster data and spatial map processing.

GraphCNN

CNN is designed to process images which can be represented as a regular grid in the Euclidean space. However, there are a lot of applications where data are generated from the non-Euclidean domain such as graphs. GCN is one such example.

GCN employs stacked convolutional layers to generate latent embedding in nodes such that it captures information about much farther neighbors. The embeddings are then fed to feed forward networks to achieve classification or regression.

RNN and LSTM

Recurrent Neural Networks are a class of neural networks that are designed to recognize the sequential characteristics of data and predict the next likely scenario. This is achieved by storing the historical information in their internal state and processing variable sequence of inputs.

Seq2Seq

A sequence to sequence (Seq2Seq) model is a general framework where an input of fixed length is mapped to an output of fixed length. It has applications in machine translation and speech recognition. Seq2Seq model is widely used in ST prediction tasks where the ST data present high temporal correlations such as urban crowd flow data and traffic data.

Autoencoder (AE) and Stacked AE

An autoencoder is an unsupervised learning technique that involves using an artificial neural network to learn through an encoding layer, a hidden layer and a decoding layer, the encodings for a given dataset. Autoencoder has applications in clustering, classification and anomaly detection. A stacked autoencoder (SAE) is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer.

What is the general framework to solve an STDM problem?

A general framework for solving STDM problems is similar to how deep learning problems are solved. But the devil is in the details. The steps would ideally be:

The raw data from different sources for ST data have to be constructed for data storage. The data instances are of the form of time series, spatial maps, points or raster data.
The ST data thus constructed has to be further processed to fit the different deep learning models. Data is usually represented as sequence data, 2D matrices, 3D Sensors or graphs.

Based on the data, one can choose RNN or LSTMs(Temporal data), CNN(Spatially correlated data) or a hybrid model which can handle them both. Finally, the deep learning models that are thus constructed can be used to handle various tasks such as prediction, classification.

Steps in an STDM Task

ST Data Preprocessing

The data preprocessing for STDM is to transform the data into some format that the deep learning model can handle. The data collected can be in temporal databases or be in formats like ARCGIS. While the data can be represented naturally using certain formats like tensors, vector or a matrix, data transformations are also based on the model that shall be used.

Consider trajectory or time series data, which can be represented as sequence data. But trajectory data can be represented as a matrix and then a CNN Model can be used to capture spatial features. The ST field is represented for a trajectory in a matrix such that all the paths over a cell region are set to 1 while the rest are set to 0. The CNN Model can then be applied to this matrix to derive the spatial features.

Spatial map data can be represented as a 2D matrix. But it can also be modeled as a graph. Consider the case where the sensors are all deployed in express ways such that the nodes are the sensors and road segments are the edges. Such data can be utilized in a GraphCNN model to predict the traffic speed.

ST raster can be represented as a 2D matrix or 3D tensor. For example, fMRI data from the brain can be represented as a matrix by extracting the time series correlations between pair-wise regions of the brain for brain activity analysis.

Designing a deep neural network for STDM

Different Models based on the Data Representation
Once the data has been transformed. The next step would be to determine which model would best fit the data and the STDM task that has to be solved. Data representation has a big impact on the kind of neural network that can be used. Any data that has a certain temporal continuity require a different approach while spatial feature learning requires a completely different approach. Designing a neural network for the data at hand would require an understanding of this.

Recurrent neural networks allow previous outputs to be used as inputs while having hidden states. This makes it suitable for processing data that has a temporal quality to it. Sequence data, Time Series data or Spatial Maps captured over a time period all fall in this category. RNN, LSTM, GRU, Seq2Seq, AE and hybrid models all fall in this category.

On the other hand, GraphCNN is designed to process graph data and capture the spatial correlations among the neighbor nodes. CNN, in general, is very good at handling spatial features of the data. Therefore they are a good option to process spatial maps, rasters etc.

To process ST data, a hybrid model combining both RNN and CNN where the RNN captures the temporal aspect while CNN process the temporal features is a common pathway. There has also been a trend of Network Embedding, Multi-layer Preceptron, Generative Adversarial Networks, Residual Nets, Deep Reinforcement Learning being used to solve STDM tasks.

Designing for the STDM task that has to be solved
Once the type of neural network for the data at hand has been determined, it is now important to design the layers that would contribute to the output that is expected. Based on the study, the major STDM problems and the approach along with use cases have been detailed below.

A. Predictive Learning

The primary purpose of predictive learning in case of ST data would be to predict future observations based on the historical data. Predictive problems can lead to different architectures based on different data instances that form the input and the output. Point Data is usually transformed into time-series or spatial maps and deep learning models are then applied. Crime statistics, Traffic accidents and social events fall into this category. In one case, the crime-distribution over a region was transformed into heat-maps and the heat-maps were used as input to hierarchical structures of residual convolutional networks train a crime prediction model. In another case, point data of traffic accidents was modeled along with the time and location as a 3-D tensor and input to a ConvLSTM to predict the subtypes of future events that may occur at different locations.

Time Series on the other hand can be fed to stacked autoencoders to learn features. This is what was done to traffic flow time series data for road-segment level traffic flow prediction. Deep Belief Networks have also been used to predict traffic flow based on past traffic flow observations. But the most common tool used to predict time-series data is RNN/LSTM with modifications. More often than not, especially with time-series data, though spatial in nature, the spatial aspect may not have relevance and is ignored.

Spatial maps have consistently been predicted using CNN. Spatial maps are represented as image like matrices and a CNN prediction model can be used to capture the spatial features. ConvLSTM is a sequence-to-sequence prediction model, where each layer has a convolutional structure in both input-to-state and state-to-state transitions. ConvLSTM can input and output spatial map matrices. Spatial-Temporal prediction has been achieved by using multiple ConvLSTM layers to catch the ST patterns hidden in the data. ConvLSTM has also been used as a part of FCL-Net model which fuses both ConvLSTM layers and standard LSTM layers to forcast for example, passenger demand in different regions of the city. Another interesting neural network module that has been proposed is called Attentive Crowd Flow Machine. ACFM utilizes attention mechanism through two progressive ConvLSTM units connected with a convolutional layer to infer the evolution of crowd flow by learning the dynamic representation of the time-series data. GraphCNN and Hybrid methods have also been utilized to predict spatial maps.

Modeling ST data as graphs has been a constant in data mining. For example, traffic data can be naturally modeled as a graph. GraphCNN and GraphRNN can be applied to now learn a prediction task. Traffic flow has been predicted using a DCRNN — Diffusion Convolutional Recurrent Neural Network, the idea here is use diffusion process on a directed graph to capture the spatial and temporal dependency of the traffic flow of the entire road network. That is, the spatial dependency is captured via random walks on the graph and the temporal dependency through encoder-decoder architecture with scheduled sampling.

Prediction of rasters, trajectories align well with the sequence/spatial map problems and can usually be solved with the same.

B. Representation Learning

Representation learning to used to learn higher abstractions of the data. A lot of work in representation learning has been in trajectories and spatial maps. They have a strong presence in human flow, location based social networks and mobility services. This is usually achieved by linear/non-linear transformations of input data. A lot of work in this field has been on trajectory data. For example, seq2seq-based model has been used to learn how similar trajectories are. Another example of representation learning is how a trajectory was transformed into a feature sequence which was then utilized in a sequence-to-sequence autoencoder to learn fixed length representations which can be utilized for clustering. CNN have been employed to learn representations from spatial maps and images.

C. Classification

The classification task is mostly studied in analyzing fMRI data. Brain imaging data has been used along with deep learning to understand classify brain areas based on function and for disease classification.

D. Estimation and Inference

Estimation and Inference are particularly useful when there are fewer sources of data and it is valuable to understand the underlying distribution of data. Current works on ST data estimation and inference mainly focus on the data types of spatial maps and trajectories. In case of spatial maps, feedforward neural networks have been used to model static data and RNN has been used to model sequential data followed by hidden layers to capture feature interactions. Another interesting application has been to build a stacked denoising autoencoder to automatic extract features from infrared cloud images and to estimate the precipitation in a given area.

A very interesting application of ST estimation has been to predict the travel time of a path using historical trajectory data and modeling it via RNN. Another application has been to understand the purpose of a trip based on the personal mobile phone GPS data. Travel Demand Analysis and Transportation planning heavily rely on using ST data to model CNN and infer different aspects such as travel modes i.e., walk, bike, bus, driving or train.

E. Anomaly Detection

Anomaly Detection or Outlier Detection is utilized to identify faults, rare events or outliers in data. Anomaly Detection has been used to identify non-recurring disruptions in traffic such as accidents, sports etc. Event Data is the biggest research space for Anomaly Detection. For example, CNN have been used to identify traffic anomalies caused by events. One study utilized Deep Belief Network and LSTM to identify the occurrence of accidents through social media data utilizing data an year’s worth of tweets and traffic data from New York and Virginia. A deep model of Stack denoise Autoencoder was proposed to learn hierarchical feature representation of human mobility, and these features were used for efficient prediction of traffic accident risk level.

F. Attention Mechanism

Attention is a mechanism that was developed to improve the performance of the Encoder-Decoder RNN on machine translation. Encoder-Decoder RNN can have performance issues with long sequences because they utilize fixed-length internal representation. To address this the model is trained to pay attention to encoded words in the source sequence and during the prediction in target sequence. Attention can either be classified as soft or hard attention. It can further be classified as either temporal or spatial attention. Another classification is whether the attention is local or global. Regardless attention is a vector of weights that is usually a function of the correlations between different elements. It is used to quantity the interdependence between the different features and itself.

Conclusion

In this article, I have explored what spatio-temporal data mining is, what are the different representations used for ST data and how some of the machine learning tasks be achieved through Deep Learning. This article is a summary of some of the ideas and topics tackled in the paper I mentioned above.

Spatio-Temporal Data Mining using Deep Learning has huge potential and has been gaining a lot of traction. But interpretability is a big open problem both in STDM and in deep learning even otherwise. With wide spread application and on-going research, this is something that we can look out for.