By Federico Ricciuti, Machine Learning Specialist, Tre Altamira

linkedin.com/in/federico-ricciuti-b490ab59

Persistent Scatterer Interferometry (PSI), Ferretti et al. (2001), is a multitemporal InSAR technology which allows measuring the surface displacement of radar targets that exhibit a stable return. The input of a PSI analysis is a set of co-registered SAR acquisitions; the output a temporal georeferenced point cloud. For each point, a set of information, Figure 1, is provided:

  • Temporal Coherence
    • Scalar value correlated to the stability of the measurements
  • Displacement Time Series
  • Displacement velocity
    • Scalar value derived from the Displacement Time Series
  • Coordinates (Latitude, Longitude)
Figure 1: Overview of a PSI analysis

Given a PSI point cloud, every point displacement time series has the same length and sampling regularity, depending on the satellite used and the temporal frame of acquisitions.

For the DeepCube project, one of the models that we have created is a Graph Neural Network (GNN) for the prediction of the reliability associated with each point of a PSI point cloud in input.

After the application of a PSI analysis over an area, InSAR experts and geologists are sometimes forced to manually remove those points that contain unreliable measurements not associated with “real” deformation. For this task, experts analyze PSI displacement velocity maps in combination with other external layers, e.g., Digital Elevation Model (DEM), SAR amplitude, Land use/cover maps (LULC), Figure 2.

Figure 2: Point removal activity made by InSAR and geologist experts

The same rationale is at the basis of the reliability prediction model. To train the model, three different PSI point clouds (~100.000 points) were manually labeled into reliable and unreliable points, using the displacement velocity map and the temporal coherence of each point from the PSI analysis, a LULC layer, and a DEM layer and a SAR amplitude layer, which was obtained by averaging all the SAR amplitude images acquired over the period of the PSI analysis.

Figure 3: Features associated to each point

The integration of the external layers into the model was achieved by extracting a small image patch (80×80) from the external layers (DEM, LULC, SAR), centered in the position of each point. Eventually, 19,202 (80x80x3 + 2) features for each point were obtained, Figure 3.

Figure 4: Model architecture. ConvNext (Liu et al. (2022), depths: 2/2/4/2, dims: 12/24/48/96

The architecture of the model is depicted in Figure 4. The Network was trained end-to-end, from the pointwise feature extraction (PW Encoder) to the application of the graph layers and the final output layer (GNN Encoder). During training, a simple Neighbor Sampling strategy was employed. Different architectural choices were evaluated, e.g. GNN layer models: UniMP (Shi et al. (2021)), GATv2 (Brody et al. (2021)), DyResGEN (Li et al. (2021)), pointwise encoders and multimodality fusion techniques.

Runtime Optimization

After the training, the optimization of the inference pipelines had to be considered because of the scale of the target application (processing PSI point clouds containing millions of points). One of the most time- and resource-consuming steps is the extraction of the pointwise features. To optimize this step, this part of the model was run in an optimized runtime environment. After different comparisons and portability considerations, the ONNX Runtime with the CUDA Execution Provider (https://onnxruntime.ai/) was selected. This allows speeding up the inference of Deep Learning models by optimizing model computational graph and execution. ONNX Runtime shortens the inference execution time by 1.85 with respect to PyTorch, Table 1.

PyTorch ONNX – CUDA EP Speed Up
4.8 ms 2.6 ms 1.85
Table 1: ONNX vs Torch execution environment, average results of the Pointwise Encoder over 100 batches of 32 points, T4 NVIDIA GPU

Sampling Optimization

Thanks to the structure of the network, another important optimization opportunity was available. The pointwise and graph layer computations can be divided into two different steps. The plain application of the entire network architecture with the same sampling strategy, as during the training (Neighbor Sampling), would have limited its application in large-scale scenarios. To overcome that, two different sampling strategies were used and chained one after the other. For the pointwise computation, only a standard Batch Sampling strategy was applied as the graph structure was not necessary. Only the outputs of the last layer of the pointwise encoder were saved. The position of the points was then used for the creation of the Graph associated with the PSI point cloud and the construction of the edge features, the outputs of the pointwise encoder were used for the initialization of the node features for the GNN encoder, Figure 5.

Figure 5: Sampling procedure, psi.shp is the shapefile containing the results of the PSI analysis

This sampling procedure accelerated the inference pipeline, increased the GPU utilization of the Pointwise Encoder and avoided repeating unnecessary computations. Although it is not explicitly mentioned in Figure 5, the steps of the Feature Extraction and the Pointwise Encoder were sequentially applied to subsets of the original point cloud as the memory requirements for their application to the entire PSI Point cloud can be very high, see Figure 6. In future work, the possibility of applying these in parallel for each subset should be explored.

Figure 6: Breakdown of the Feature Extraction and Pointwise Encoder steps

Docker Optimization

Our Inference code was released as a Docker image that was then executed in a cloud environment. The reduction of its dependencies and therefore size has an impact in terms of storage and building costs. To reduce the dimension, multi-stage builds were applied in the construction of the docker images: the first building step downloads and compiles all the dependencies (numpy, sklearn, etc.), the second building step imports them. An important advantage of using the ONNX Runtime is the potential reduction in the number of Python dependencies since PyTorch is no longer necessary for the execution of the model. Starting from the Docker images used for training the model (derived by the NVIDA NGC PyTorch catalog: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), it is possible to reduce the base image dimension by a factor of more than three through combining the multi-stage approach and the removal of the PyTorch dependency, Table 2.

Base Image Dimension
NVIDIA PyTorch Image for Training + Requirements
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
~14GB
NVIDIA CUDA Image for Inference + ONNX Runtime + Requirements
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda
~4GB
Table 2: Docker Images used for training and inference

Up to now only the PyTorch dependencies for the Pointwise Encoder component of the model were removed, but not for the GNN Encoder. Because currently both the components are executed in the same environment, a Docker Image which also includes PyTorch had to be created for the execution of the inference pipeline. The optimised version of this PyTorch Docker Image weighs ~6GB, but as a next step the PyTorch Dependencies should also be removed from the GNN Encoder to allow converting it to the ONNX format.

Future Optimizations

In order to avoid repeated computations, we will evaluate the application of the CNN component of the Pointwise Encoder directly to the external layers and associate the extracted feature maps to the original location of the points (similar to Faster R-CNN). Up to now, the ONNX Runtime is used only for the execution of the Pointwise Encoder and not for the execution of the GNN Encoder. We will explore the conversion of the GNN Encoder to the ONNX format in the future. The Pointwise Encoder does not depend on PyTorch, and while the GNN Encoder depends on it for some data preprocessing and sampling operations, we would like to remove this dependency in order to decrease the size of the Docker Image used for the inference pipeline, Table 2. During different experiments in training the model in Mixed Precision, the accuracies did not degradate. Therefore, we would like to go further in this direction and quantize the model and use Quantization Aware training strategies in the future.

Acknowledgements

All the optimization ideas and implementations that were presented are the results of the collaboration between our internal R&D and IT teams, in particular:

Alessandro Menegaz, https://it.linkedin.com/in/alessandro-menegaz-38008218

Pietro Panzeri, https://it.linkedin.com/in/pietro-panzeri-50007488

Roberto Ciatti, https://it.linkedin.com/in/roberto-ciatti-2a04392

Daniele Molteni, https://www.linkedin.com/in/moltenidaniele/

References

Ferretti, Alessandro, Claudio Prati, and Fabio Rocca. “Permanent scatterers in SAR interferometry.” IEEE Transactions on geoscience and remote sensing 39.1 (2001): 8-20.

Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. “A ConvNet for the 2020s.” arXiv preprint arXiv:2201.03545 (2022).

Brody, Shaked, Uri Alon, and Eran Yahav. “How Attentive are Graph Attention Networks?.” arXiv preprint arXiv:2105.14491 (2021).

Shi, Yunsheng, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. “Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification.” arXiv preprint arXiv:2009.03509 (2020).

Li, Guohao, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. “DeeperGCN: All You Need to Train Deeper GCNs.” arXiv preprint arXiv:2006.07739 (2020).