Marc PIERROT DESEILLIGNY and Ewelina RUPNIK
LaSTIG, ENSG-IGN, Gustave Eiffel University, Saint-Mande, France
Image matching is a well-known problem in the Earth sciences and features in two main applications: 1) the computation of 3D geometry from images acquired from different viewpoints and 2) the computation of surface deformations from images acquired from similar points of view. Many scientific communities (photogrammetry, robotics, computer vision, Earth science, etc.) have investigated image matching problems, which has led to many different naming conventions. In this chapter, we will interchangeably use the terms image matching and image correlation to describe the same concept, which is introduced in the following section. The image matching concepts and methods that are described in this chapter are presented in the context of optical data. Most of them can also be applied to SAR amplitude images. However, since the radiometry and geometry of SAR sensors are different from optical sensors, methods specific to SAR imagery will be presented in Chapter 3.
Given a pair of images, let us assume that for some physical reasons, one image is a deformation of the other. The goal of image matching is to recover this deformation, given the images. In mathematical terms, let I1 and I2 be the images we want to match, and let I2 be the deformation of I1 by a certain geometric function G. For each pixel p, we can then write:
Equation [2.1] assumes that radiometry is directly comparable between the two images. This hypothesis is true in optical flow approaches (Horn and Schunck 1981) because the images are acquired from nearly the same viewpoints and at small time intervals. In Earth sciences, when images are collected (a) from different orbits over non-Lambertian scenes, (b) at different dates and (c) often using different spectral bands, this hypothesis does not hold true. To account for the different phenomena, we assume that the radiometry of I2 is altered by some unknown factor, and that the induced change is only locally constant (i.e. it depends on the surface material, the scene geometry, texture and atmospheric conditions). We then introduce R, the local radiometric deformation, and rewrite equation [2.1] as follows:
We can now reformulate the matching problem: given two images I1 and I2, where one image is a deformation of the other following equation [2.2], recover G, and optionally R. Note that, formulated in this way, image matching is an ill-posed problem because there are an infinite number of solutions satisfying the equation. As evidence of this, by taking the simpler equation [2.1] and setting I1 = I2, it can be seen that the solution is known up to a geometric transformation G that preserves the radiometry of I1.
Matching can be carried out in several geometries, but for the purpose of this chapter, we only distinguish between two simple cases:
In this chapter, we treat the two cases identically. The reader should keep in mind that in the 1D case, the search region is a 1D interval, whereas in the 2D case, it is a 2D region (e.g. square, rectangle, circle, ellipse).
The most common image-matching approach is the so-called vignette or template matching. It allows us to narrow down the solutions to equation [2.2], which is justified by the following hypotheses about the R and G deformations:
Equations [2.3] and [2.4] may seem restrictive; however, in practice, it is sufficient that G is a smooth function and that we know an approximation of G (e.g. by assuming the mean altitude of the scene). When this is the case, we can compute a rectified image Ĩ2 of I2 that is defined by Ĩ2(p) = I2((p)), and compute the image deformation ΔG between I1 and Ĩ2. Finally, we set the output deformation to G = ° ΔG. This said, for the sake of simplicity, we assume in the following that equations [2.3] and [2.4] apply.
Assuming that equation [2.3] is true, we can state that the correspondence of a pixel p is a pixel p′ = p + δ(p) with |δ(p)| < dmax, where dmax is a given threshold. When dmax is sufficiently small, thus avoiding ambiguity and long computation, the matching problem boils down to computing, for each pixel p, its most similar pixel in the search region given by D ∈ ‹−dmax, −dmax› (see Figure 2.1).
For a given pair I1, I2, let DS(p, p′) be the dissimilarity between two pixels p and p′ . If Nd is the neighborhood of size dmax, the matching algorithm is simply:
We now define a similarity measure that is both non-ambiguous and efficient to compute.
Ignoring the radiometric deformation R, let us have a closer look at equation [2.1]. Assuming hypothesis [2.4], we define a dissimilarity measure using the following formula, where V is the neighborhood that defines a window:
Note that the formula of equation [2.7] has a strong implication – it means that if p′ is the corresponding pixel of p, then for any v in the neighborhood V , p′ + v is the corresponding pixel of p + v. In other words, it is the hypothesis expressed in equation [2.4]; however, it is given in a discrete form. Of course, the key issue here is the size of the neighborhood V : if V is too small, then matching will be ambiguous; if V is too big, the above discrete hypothesis will not be true.
Let us now consider equation [2.2]. Due to radiometric deformation, for a pair of corresponding pixels p and p′ , I1(p) I2(p′) no longer holds true. Therefore, the formula from equation [2.7] cannot be used. The general guideline to handle this inconsistency is as follows:
Rank normalization is one of the most general parametric models, as it only imposes that R be a growing function. In the continuous case, i.e. with an infinite number of measurements, the normalization consists of histogram normalization. In the discrete case, as in the matching of a window of a finite size, this comes down to a rank normalization. That is, for each image Ik, and each pixel q in the neighborhood of p, compute the number of pixels with a value lower than q in the neighborhood V :
One of the drawbacks of this normalization is that it is relatively slow to compute. If N is the cardinal of V , a naive implementation can lead to a cost of N2 complexity. Using a quick sort algorithm to globally sort all values in V (p), the computation complexity can be reduced to N log(N).
Most often, rank normalization is too general; therefore, more restrictive similarity measures are sought. A common model presumes that the radiometric deformation between I1 and I2 can be represented locally by an affine transform, with a being the translation and b the scaling factor:
Using this assumption, we can compute normalized images independent of any affine transform within the window. This can be done in two steps:
1) render the radiometry invariant to translation by subtracting the average of the radiometries (equation [2.11]);
2) render the radiometry invariant to scaling by dividing it by its standard deviation (equation [2.12]).
where N is the number of pixels in the template (i.e. N = (2n + 1)2 for a square window of size n). For instance, in Figure 2.1, n = 1 and N = 9.
The average and standard deviations are computed from:
Immediately we can see that is invariant to any translation scaling in the radiometry:
The normalized cross-correlation is then defined as a dissimilarity measure DS:
Alternatively, and as is often encountered in the literature, it is known as:
The above two expressions are closely related by:
The DSNCC dissimilarity measure remains the most popular similarity. Its wide use is due to numerous advantages:
Rank distance and normalized cross-correlation use all the elements of the window to normalize the radiometry. As a result, in the presence of outliers, the normalization will be contaminated by false measurements. Outliers are not rare in image matching; for instance, when performing image matching for 3D geometry computation, outliers can arise systematically due to, for example, occlusions.
Census is a similarity measure that is robust to outliers and thus addresses the above shortcoming. To introduce census, let us define a binary operator
The operator applied to a window, converts it to a binary vector, where all elements larger than the value of the central pixel Ik(p) are true. By comparing such binary vectors computed for a pair of windows (centered at p and p′ in images I1 and I2, respectively), we obtain the census coefficient:
What makes census robust is that the normalization is undertaken using only the value of the central pixel. Consequently, in the sum in equation [2.20], if there is an outlier, it will not influence the output score. Conversely, because of the strict normalization, the census coefficient is not very discriminant. With this in mind, we should apply census: (i) with large window sizes in order to bring “more” information or context into the measurement vector; or (ii) in combination with the regularization techniques described in section 2.3. A use case where census performs well is the computation of 3D scene geometry with severe occlusions.
There are many similarity measures and listing them all is beyond the scope of this chapter. Here, we mention several common alternatives:
Limiting the matching problem to the definition of a similarity measure is possible when the search space is limited. When the search interval is increased, two limitations arise. First, the likelihood of erroneous matching grows with the growing number of tested hypotheses and leads to noisy results. Second, the computation time increases proportionally to the number of tested hypotheses.
In this section, we describe the most typical ways to solve the above-mentioned problems. The methods presented in section 2.3.2 address the robustness aspect, while in section 2.3.3, both the robustness and time issues are dealt with.
As described in section 2.1.3, to solve the matching problem certain hypotheses about the regularity of the unknown geometric deformation G must be developed. However, by simply computing the most similar correspondence of a given pixel, there is no guarantee that the outcome deformation will comply with the regularity hypothesis. To impose it, the common practice is to force the geometric deformation to follow the hypothesis in equation [2.4].
A straightforward way to implement this behavior is to use an a posteriori filtering technique on the computed deformation values, for example:
However, a posteriori filtering is only a post-processing technique and does not use all the available information to recover the “correct” deformation. Let us look at the scenario illustrated in Figure 2.2. Over a relatively large area, two alternative geometric deformations are presented as a blue line and a dashed line, respectively. The blue line is wrong, although it results from the most similar correspondences, and the dashed line is the real deformation but the second best result. In this scenario, filtering techniques are not capable of recovering the real deformation.
Ideally, we would like to guide the algorithm to perform the following: compute a deformation that maximizes the similarity measure but if there is a deformation with good enough similarity that is smoother (therefore more likely) than the first, choose the latter. In other words, the algorithm searches for a trade-off between similarity and regularity.
To formalize this, for a given deformation G, let us define the dissimilarity DS and the regularity Reg terms:
where the terms are defined over all pixels p of image I1, ∂(G − Id) is the gradient of G minus identity and α can take practically any value, but it should be chosen based on the characteristics of the deformation G (see Figure 2.3). Of course, the remarks at the end of section 2.1.3 also apply here, and Id can be replaced by any smooth a priori approximation of G.
Our aim is to find a solution that simultaneously minimizes DS and Reg. With the concepts presented up to now it is impossible, because the solution minimizing DS is the argument of the minimum for each pixel independently (defined in equation [2.6]), while the solution minimizing the regularity is simply the identity. An energetic approach allows us to handle this trade-off in a neat way by defining a function E:
Here, the deformation that we look for is defined as the deformation G that minimizes E. In equation [2.23], the term DS, known as the data term, depends only on the images, while Regα, known as the regularization, represents the a priori knowledge of the smoothness of G. The β factor allows us to control the relative importance of the data and regularization terms.
Currently, the energetic approach is the most common way to solve the matching problem, both in academic and commercial software solutions. It is an elegant and quite general way to combine the information brought by the data with the a priori information. When embarking on this approach, two non-trivial issues must be investigated: (1) the relation between the regularization term and the solution that will be favored; (2) once the function E is defined, the choice of the algorithm that can efficiently compute the deformation, minimizing equation [2.23]. An in-depth answer to these two questions is far beyond the scope of this chapter; however, in the following, we provide a brief commentary on the current methods.
Regarding the impact of the regularization term, Figure 2.3 illustrates how α given in equation [2.22] can be used to favor diverse geometries. The dashed interval represents no image information and the red line corresponds to a region for which the image information is good enough to find the solution (i.e. deformation). Let us suppose that the solution in the information gap is steered only by the regularization term. Depending on the value of α, we can obtain different solutions:
Concerning minimization algorithms, the most common methods are mincut–maxflow, semi-global matching (SGM) and variational methods. The mincut–maxflow computes the exact minimum of equation [2.23]; however, it imposes that α ≥ 1 and is relatively costly in time and space (i.e. it has high memory consumption). The SGM is a version of the dynamic programming method. It is not limited by the value of α and is efficient in both time and space. On the downside, it does not compute the exact minimum, which sometimes leads to unpleasant artifacts. Finally, the variational approach uses nonlinear least squares to solve equation [2.23]. As the deformation G is computed globally, there is no need to use the template method; as a result, the solution does not suffer from the artifacts specific to the window-based method. A drawback is that the variational approach requires good initialization of G.
When dealing with large deformations, the regularization approach partially solves the robustness issue but not the CPU issue. In fact, it increases the total computation time by having to include the regularization. Multi-resolution processing, a particular case of the “coarse to fine” strategy, can further improve the two aspects (Pierrot-Deseilligny and Paparoditis 2006; Rothermel et al. 2012). The strategy is to first obtain an approximate and fast (i.e. coarse) solution and then to use it to compute a finer solution. A basic multi-resolution approach can be described as follows:
This procedure is typically used recursively by computing a pyramid of images and starting the computation with the lowest resolution images. For example, in MicMac software (Rosu et al. 2015), when processing large satellite images, the downsampling currently goes to 1/32 or 1/64 at the lowest resolution.
Until now we have made the implicit assumption that for a given pixel p with integer coordinates, its corresponding pixel is also a pixel q with integer coordinates. The drawback of this assumption is that we can only compute integer deformations and this thus reduces the accuracy of the measurement. Sub-pixel deformations cannot be determined because they are invisible to the matching algorithm. Typically, if we work with satellite images of 10 m resolution, this “integer” approach would limit the accuracy to 5 m.
There are two ways to overcome this shortcoming, both based on interpolation, which we refer to as the interpolation of similarity and the similarity of interpolation. In the interpolation of similarity, for a given pixel, we (1) compute and memorize the similarity calculated in the search region, using integer pixels, (2) use some quadratic function to model the similarity function and finally (3) extract its maximum. In the similarity of interpolation, we select an interpolating algorithm and use it to extract the value of I2 using non-integer pixels (i.e. at sub-pixel positions). This approach is generally slower but more general. To limit the signal aliasing, precautions must be taken in selecting the interpolating kernel.
Using such techniques, it is possible to achieve the accuracy of of a pixel. Nevertheless, the accuracy remains largely dependent on the images (e.g. the sensor, the scene type, etc.) and may vary from in an unfavorable condition to 100 in very favorable conditions.
In the previous sections, we used an implicit assumption that in both forms of matching (1D and 2D), we could approximately predict the position of corresponding pixels. This prediction is possible thanks to the known geometry of the sensors with which the images are acquired. The geometries describe position and rotation in 3D space and enable relating images taken from different viewpoints.
The two common classes of sensor geometries are frame and pushbroom geometries. Frame cameras dispose of an area array sensor in which all the pixels share the same optical center and are exposed to light simultaneously1. Pushbroom cameras, in contrast, are composed of a linear sensor array and scan the scene along the trajectory of the sensor. As a result, each image line is acquired at a different time and from a unique position in space, following the perspective projection along the linescan and the orthogonal projection in the perpendicular direction. In fact, the real pushbroom sensors used in spaceborne imaging have a complex layout that is often composed of several detectors situated in the focal plane (Lier et al. 2008). To facilitate the use of the images, the images undergo resampling to create an ideal sensor. The ideal sensor simulates a single-line pushbroom sensor, free from optical distortions and from the jitter artifacts caused by high-frequency attitude changes (De Lussy et al. 2005).
Until recently, frame cameras were more common in terrestrial and airborne applications, whereas pushbroom scanners were a standard in spaceborne imaging. Thanks to technological advances and lower costs, we can observe more and more of the frame arrays used in satellites (e.g. Planet’s CubeSat constellations).
Sensor orientation in 3D space is carried out in two steps by applying the so-called intrinsic and extrinsic calibration parameters. The intrinsic parameters allow us to associate a pixel measurement in the 2D image coordinates with a bundle in 3D space. The extrinsic parameters orientate the camera in the world coordinate frame; in other words, they allow us to georeference a camera. These steps are realized by three consecutive operations: transformation from the world to the camera coordinates; perspective projection from 3D to 2D camera coordinates; and finally from a 2D camera to image coordinates. This mapping can be represented in a compact equation form, referred to as the collinearity equation (see Figure 2.4):
where Rk and Ck are the extrinsic orientation parameters of some image k; in particular, Rk is the rotation matrix that moves from world coordinates to camera coordinates, and Ck is the image perspective center in the world coordinates; represents the perspective projection; I indicates the intrinsic parameters (i.e. the focal length, the principal point and the distortions) and pk the image coordinates (i, j)k of the point P in some image k.
Equation [2.24] holds true for frame cameras, i.e. when all pixels are exposed at the same instant. To take into account pushbroom cameras, we should generalize the above equation by including the time component t:
The projection equation of this form is rigorous because it describes the passage from the 3D ground scene to the 2D image space using physical parameters. The physical nature of the parameters is considered advantageous because they (i) are seldom correlated, (ii) can be easily interpreted, and (iii) can be optimally adjusted. Nonetheless, for several reasons, the rigorous model is rarely used in practice. One of its main drawbacks is the lack of standardization among the satellite image providers, hence the difficulty in maintaining those models in various software solutions. Another drawback is due to its complexity and, in particular, the fact that the number of unknowns grows with the number of image lines. In response to this, several sensor replacement models have been proposed over the last 40 years. Among them are the polynomial image geometry model; the grid interpolation model; the rational function model (RFM), also known as the rational polynomial coefficient (RPC) model; and the replacement sensor model (RSM). Nowadays, the RPC model is the prevalent choice to describe the geolocation of a satellite sensor; therefore, we discuss it further in the next section.
Note that equation [2.25] does not explicate the ray’s refraction effects but assumes rays to be straight lines. In practice, image rays are bent as they pass through the different layers of the atmosphere, and to compensate for this phenomena, image corrections are applied. The atmospheric refraction correction is inversely proportional to the square of the focal length; therefore, it is less pronounced for narrow-angle satellite sensors. For example, for near-vertical acquisitions by Pleiades satellites with an incidence angle of 15°, the geolocalization error reaches approximately 2 m (see the Pléiades Imagery User Guide). For more details on sensor models, see OpenGIS Consortium (1999) and McGlone (2013).
The RPC model relates the ground coordinates and the image coordinate using two rational polynomial functions:
where are the normalized sample (column) and line (row) indices; (λ, φ) are the latitude and longitude coordinates in the WGS-84 geodetic system and (L, P) are their normalized equivalents; h and H are the ellipsoidal altitude and its normalized equivalent, respectively; and pi indicates polynomials of the third order; the normalization is defined in the range
The accuracy of the RPC model is conditioned by the estimation method. We can distinguish between terrain-dependent (from ground control points (GCPs)) and terrain-independent (from the physical sensor model) estimation methods. If the RPCs are generated from the physical sensor model, the accuracy is generally as good as that of the model that served to estimate the coefficients. If the physical sensor is not available (e.g. due to poor performance of the on-satellite direct measurement devices), the quality of the RPCs is highly dependent on the number as well as the distribution of GCPs and deteriorates quickly with insufficient number points (Tao and Hu 2001b). It should be noted that the rational polynomial functions are generally not capable of modeling high-frequency distortions caused, for example, by a satellite’s vibrations. We owe their accuracy to the image pre-processing step that transforms the real sensor to an ideal one (De Lussy et al. 2005).
Even though RPC-encoded geolocation is precise with regard to the physical model, the accuracy – i.e. the absolute precision with respect to some Earth-fixed coordinate frame – is not guaranteed. The RPCs delivered customarily with Pléiades, SPOT-6/7, QuickBird, WorldView, PlanetScope, GeoEye, Ikonos, Cartosat, etc., can be misaligned by up to 10 pix (CE90), equivalent to several meters on the ground (Oh and Lee 2015). Such deviations are due to errors in direct measurements on board the satellite (i.e. GNSS, gyros, star trackers), especially errors in attitude and position (Fraser and Hanley 2005). These may hinder further processing; for example, the digital surface model generation with matching algorithms (d’Angelo 2013). Note that due to the short-term correlation of onboard measurements, errors in RPC geolocation from in-orbit acquisitions are less pronounced than errors from multi-orbit and multi-epoch acquisitions. The RPC model is therefore usually subject to refinement. For narrow field-of-view sensors, as in the satellite pushbroom camera, the attitude errors induce the same net effect on the ground as do the in- and across-track positioning errors. As a result of this correlation, sensor errors can be modeled with two correction functions that are defined in the image coordinates: (1) δf in the line direction, to absorb the in-track orbit and pitch errors; and (2) δg in the sample direction, to absorb the equivalent across-track errors (Grodecki and Dial 2003):
where δg and δf are polynomials. The parameters of the correction functions are estimated in a least squares adjustment with image correspondences and GCPs used as observations. To strengthen the solution, the equations can include additional constraints imposing that the (δg, δf) displacement fields follow a global sensor rotation (Rupnik et al. 2016). The GCPs are commonly collected using existing databases of orthophoto images and global digital elevation models; for instance, the Sentinel-2 and Shuttle Radar Topography Mission (SRTM) DEMs (Kääb et al. 2017).
Given a pair of overlapping images and their known geometries, in order to predict positions of their corresponding points, we often resort to the so-called epipolar geometry. Epipolar geometry is mainly useful in two tasks: relative orientation of an image and image matching for digital surface model (DSM) generation. During the relative orientation of frame cameras, the epipolar constraint is written into the essential matrix from which the rotation and translation of two images can be decomposed. DSM computation reduces the correspondence search from a 2D problem (i.e. a pixel in one image matched against every pixel in the other image) to a line or a curve of candidates.
To illustrate the epipolar geometry of the original image pair, let us look at Figure 2.5. By applying the intrinsic and extrinsic parameters in equations [2.24]–[2.25] to some pixel p in image I1, we obtain a 3D bundle Bp associated with that position. This bundle, projected to an overlapping image I2 via a plane π anchored at the projection centers O1 and O2, will intersect the focal plane of the image I2, giving rise to the epipolar line ep (see Figure 2.5). In other words, for some pixel p, the above relationship predicts the position of its candidate corresponding points in another image.
In image matching, it is useful to separate the correspondence search from orientation parameters. To undertake this, an image pair can be rectified to an equivalent geometry, so that the corresponding epipolar lines are aligned and parallel to image rows (see Figure 2.6). From then on, the image matching comes down to finding conjugate points across lines with the same y-coordinate, with no regard to the image geometry. For frame cameras, the rectification to the equivalent geometry involves applying two 2D projective transformations to both images of a stereo pair. Unlike the frame cameras, pushbroom sensors acquire each image row from a different perspective center, and as a result, the epipolar lines are neither straight lines nor are they conjugate across the image (Gupta and Hartley 1997). State-of-the-art approaches leverage this by either rectifying the image per patch, assuming that locally the epipolar curves are straight lines (de Franchis et al. 2014), or mapping the epipolar curves across the images, followed by a rectification transformation that straightens the curves to lines (Oh 2011; Pierrot-Deseilligny and Rupnik 2020).
Independent of the orientation model, refined sensor parameters still reveal errors related to the unmodeled satellite’s attitude and the CCD chip’s misalignment (Ayoub et al. 2008). In Figure 2.7 we can observe the effect such discrepancies have on image matching. The left image shows a displacement map obtained from matching of two ortho-rectified images. The lower left part of the image is corrupted with an oscillating signal along the vertical axis, caused by the satellite’s jitter. The induced artifacts can be up to several pixels and, therefore, for high accuracy measurements, must be eliminated. A common technique consists of modeling and subtracting the undulating signal in post-processing (Delorme et al. 2020; Michel et al. 2020).
We thank Renaud Binet and Amaury Dehecq for their careful reviews and comments that helped to improve this chapter.
d’Angelo, P. (2013). Automatic orientation of large multitemporal satellite image blocks. Proceedings of the International Symposium on Satellite Mapping Technology and Application, 1–6.
Ayoub, F., Leprince, S., Binet, R., Lewis, K.W., Aharonson, O., Avouac, J.-P. (2008). Influence of camera distortions on satellite image registration and change detection applications. IEEE International Geoscience and Remote Sensing Symposium, Boston, USA, 6–11 July.
De Lussy, F., Kubik, P., Greslou, D., Pascal, V., Gigord, P., Cantou, J.P. (2005). Pleiades–HR image system products and geometric accuracy. Proceedings of the International Society for Photogrammetry and Remote Sensing Workshop, Hannover, Germany.
Delorme, A., Grandin, R., Klinger, Y., Pierrot-Deseilligny, M., Feuillet, N., Jacques, E., Rupnik, E., Morishita, Y. (2020). Complex deformation at shallow depth during the 30 October 2016 Mw 6.5 Norcia earthquake: Interference between tectonic and gravity processes? Tectonics, 39(2), e2019TC005596.
de Franchis, C., Meinhardt-Llopis, E., Michel, J., Morel, J.-M., Facciolo, G. (2014). On stereo-rectification of pushbroom images. IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October.
Fraser, C.S. and Hanley, H.B. (2005). Bias-compensated RPCS for sensor orientation of high-resolution satellite imagery. Photogrammetric Engineering and Remote Sensing, 71(8), 909–915.
Grodecki, J. and Dial, G. (2003). Block adjustment of high-resolution satellite images described by rational polynomials. Photogrammetric Engineering and Remote Sensing, 69(1), 59–68.
Gupta, R. and Hartley, R.I. (1997). Linear pushbroom cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(9), 963–975.
Horn, B.K. and Schunck, B.G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.
Kääb, A., Altena, B., Mascaro, J. (2017). Coseismic displacements of the 14 November 2016 Mw 7.8 Kaikoura, New Zealand, earthquake using the Planet optical Cubesat constellation. Natural Hazards and Earth System Sciences, 17, 627–639.
Leprince, S., Barbot, S., Ayoub, F., Avouac, J.-P. (2007). Automatic and precise orthorectification, coregistration, and subpixel correlation of satellite images, application to ground deformation measurements. IEEE Transactions on Geoscience and Remote Sensing, 45(6), 1529–1558.
Lier, P., Valorge, C., Briottet, X. (2008). Imagerie spatiale : des principes d’acquisition au traitement des images optiques pour l’observation de la terre. Editions Cépadues, Toulouse, France.
Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Madden, F. (1999). The Corona camera system – Iteks contribution to world security. JBIS, 52, 379–396.
McGlone, C. (ed.) (2013). Manual of Photogrammetry. American Society of Photogrammetry and Remote Sensing, Bethesda, MD, USA.
Michel, J., Sarrazin, E., Youssefi, D., Cournet, M., Buffe, F., Delvit, J., Emilien, A., Bosman, J., Melet, O., L’Helguen, C. (2020). A new satellite imagery stereo pipeline designed for scalability, robustness and performance. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2, 171–178.
Oh, J. (2011). Novel approach to epipolar resampling of HRSI and satellite stereo imagery-based georeferencing of aerial images. PhD thesis, Ohio State University, Ohio, USA.
Oh, J. and Lee, C. (2015). Automated bias-compensation of rational polynomial coefficients of high resolution satellite imagery based on topographic maps. ISPRS Journal of Photogrammetry and Remote Sensing, 100, 14–22.
OpenGIS Consortium (1999). The openGIS Abstract Specification – Topic 7: Earth Imagery Case.
Pierrot-Deseilligny, M. and Paparoditis, N. (2006). A multiresolution and optimization-based image matching approach: An application to surface reconstruction from SPOT5-HRS stereo imagery. ISPRS Workshop on Topographic Mapping from Space, Ankara, Turkey.
Pierrot-Deseilligny, M. and Rupnik, E. (2020). Epipolar rectification of a generic camera. Preprint [Online]. Available at: https://hal.archives-ouvertes.fr/hal-02968078.
Rosu, A., Pierrot-Deseilligny, M., Delorme, A., Binet, R., Klinger, Y. (2015). Measurement of ground displacement from optical satellite image correlation using the free open-source software MicMac. ISPRS Journal of Photogrammetry and Remote Sensing, 100, 48–59.
Rothermel, M., Wenzel, K., Fritsch, D., Haala, N. (2012). Sure: Photogrammetric surface reconstruction from imagery. LC3D Workshop, Berlin, Germany.
Rupnik, E., Deseilligny, M.P., Delorme, A., Klinger, Y. (2016). Refined satellite image orientation in the free open-source photogrammetric tools Apero/MicMac. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 83.
Tao, C.V. and Hu, Y. (2001a). A comprehensive study of the rational function model for photogrammetric processing. Photogrammetric Engineering and Remote Sensing, 67(12), 1347–1358.
Tao, C.V. and Hu, Y. (2001b). Use of the rational function model for image rectification. Canadian Journal of Remote Sensing, 27(6), 593–602.
Viola, P. and Wells, W.M. (1997). Alignment by maximization of mutual information. International Journal of Computer Vision, 24(2), 137–154.
Žbontar, J. and LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research, 17(1), 2287–2318.