CLEANing Cygnus A Deep and Fast with R2D2

Arwa Dabbech; Amir Aghabiglou; Chung San Chu; Yves Wiaux

doi:10.3847/2041-8213/ad41df

1. Introduction

Modern radio telescopes are able to map the radio sky over large fields of view and wide frequency bandwidths with unprecedented depth and detail, owing to the sheer amounts of the data they acquire. Leveraging these capabilities in image formation raises significant data-processing challenges. The underlying radio-interferometric (RI) inverse problem calls for efficient imaging algorithms able to both deliver high-precision reconstruction via tailored regularization models, and scale to large data volumes and image dimensions. Since its inception by Högbom (1974), the CLEAN paradigm has been the standard for RI imaging owing to its simplicity and computational efficiency. In essence, CLEAN is a Matching Pursuit (MP) algorithm (Lannes et al. 1997), iteratively identifying model components by projecting the residual dirty image onto a sparsity dictionary, which in Högbom's version (hereafter Hö-CLEAN) is the identity basis. Its reconstruction is obtained by smoothing the identified model components via a restoring beam, and is complemented by the addition of the residual dirty image. Variants of CLEAN have been devised over five decades to overcome its shortfalls in terms of precision, stability, and scalability (e.g., Clark 1980; Wakker & Schwarz 1988; Bhatnagar & Cornwell 2004; Cornwell 2008). The Cotton-Schwab approach (CS-CLEAN; Schwab 1984) introduced a nested iterative structure, with an inner loop (minor cycles) identifying model components, whose exact contributions are removed from the RI data all at once at the next iteration of an outer loop (major cycle). The major cycle would typically exhibit few iterations only, conferring to CLEAN its scalability. Multi-scale CLEAN (MS-CLEAN; Cornwell 2008) introduces a bespoke multiscale basis to substitute the identity basis, improving reconstruction quality. However, CLEAN's restored image is by design restricted to instrumental resolution due to the restoring beam, and limited in dynamic range by the addition of the final residual dirty image.

Numerous algorithms for RI imaging emerged from optimization theory over the past two decades (e.g., Wiaux et al. 2009; Li et al. 2011; Carrillo et al. 2012; Dabbech et al. 2015; Garsden et al. 2015). Backed by compressive sensing theory, these methods inject handcrafted sparsity-based image models into the RI data, leveraging convergent optimization algorithmic structures. Their iterative structure is different to CLEAN's, but shares a two-step iteration process, alternating a data-fidelity step involving the computation of data residuals, and a regularization step enforcing the chosen image model. Their latest evolution, uSARA, is underpinned by the Forward–backward structure, which promotes an advanced handcrafted sparsity-based image model via a so-called "proximal regularization operator" (Repetti & Wiaux 2020; Terris et al. 2022). uSARA has been demonstrated on real gigabyte-scale RI data (Dabbech et al. 2022; Wilber et al. 2023a). Nonetheless, the complexity and the highly iterative nature of these methods have hindered their adoption by the wider radio astronomy community.

Novel algorithms for computational imaging empowered by deep learning have emerged across a wide range of applications, including radio astronomy. Purely data-driven end-to-end DNNs can provide real-time solutions to the RI inverse problem (Connor et al. 2022; Schmidt et al. 2022). However, they raise important interpretability and reliability concerns as standard architectures cannot ensure consistency of the recovered image with the observed data. Alternative plug-and-play (PnP) approaches, interfacing optimization theory and deep learning, can circumvent these concerns by implicitly injecting a learned image model into the data via pretrained denoising DNNs (see Kamilov et al. 2023, for a review). One such example is AIRI (Terris et al. 2022), which is underpinned by the same algorithmic structure as uSARA, with its regularization operator substituted with a denoising DNN. AIRI has been demonstrated to deliver slightly superior imaging precision than uSARA, while exhibiting a lower computational cost thanks to its DNN-encapsulated regularization process (Dabbech et al. 2022; Wilber et al. 2023b). However, PnP algorithms still share the highly iterative nature and associated limited scalability of their optimization counterparts. Unrolled DNNs represent another deep-learning approach, consisting in learning at once a finite number of iterations of an optimization structure (see Monga et al. 2021, for a review). They offer more interpretability than purely data-driven DNNs thanks to the integration of data-consistency layers. However, embedding measurement operators in the network architecture also becomes impractical at large scale, primarily for training, but also for image reconstruction.

Most recently, we have proposed a novel deep-learning approach dubbed R2D2, standing for Residual-to-Residual DNN series for high-Dynamic range imaging (Aghabiglou et al. 2024). R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated back-projected data residual as inputs.

In this work, we start by shedding light on R2D2's algorithmic structure, interpreting it as a learned version of CLEAN. Three R2D2 variants have been introduced and extensively studied in simulation, namely R2D2, R2D2-Net, and R3D3. We analyze their positions in the landscape of CLEAN variants, with specific comparison to Hö-CLEAN, CS-CLEAN, and MS-CLEAN. We then demonstrate the R2D2 paradigm on high-sensitivity Very Large Array (VLA) observations of the radio galaxy Cygnus A at S band, both in terms of precision and computational speed. First, the reconstruction quality of all R2D2 variants at least equates that of AIRI and uSARA, outperforming all CLEAN variants. Second, requiring few major-cycle iterations only (on par with CLEAN), all R2D2 variants provide a much faster reconstruction than uSARA and AIRI, and at least as fast as CLEAN.

The remainder of this paper is structured as follows. Section 2 introduces the RI inverse problem. Section 3 provides a summary of the R2D2 paradigm in its three variants. Section 4 presents a CLEAN perspective on the R2D2 paradigm. Application of telescope-specific incarnations of the R2D2 variants is provided in Section 5, for the formation of Cygnus A images from VLA observations. Conclusions are presented in Section 6.

2. RI Inverse Problem

Under the assumptions of nonpolarized monochromatic radio emission spanning a narrow field of view, RI data, also termed visibilities, are noisy Fourier measurements of the intensity image of interest. The sampled Fourier coverage is defined by the array antenna configuration and the specifications of the observation (e.g., direction of sight, total observation duration). Let ${{\boldsymbol{x}}}^{\star }\in {{\mathbb{R}}}_{+}^{N}$ denote a discrete representation of the unknown radio image. A discrete model of the RI data, denoted by ${\boldsymbol{y}}\in {{\mathbb{C}}}^{M}$ , reads y = Φ x ^⋆ + n , where ${\boldsymbol{n}}\in {{\mathbb{C}}}^{M}$ is an additive white Gaussian noise, with mean zero and standard deviation τ > 0. The RI measurement operator ${\boldsymbol{\Phi }}\in {{\mathbb{C}}}^{M\times N}$ is a Fourier sampling operator, often incorporating a data-weighting scheme to mitigate the nonuniform Fourier sampling and enhance the effective resolution of the data (e.g., Briggs weighting; Briggs 1995). In high-sensitivity regimes, the measurement operator model is more complex, encompassing direction dependent effects (DDEs) induced by atmospheric perturbations and instrumental errors (Smirnov 2011).

The image-domain formulation of the RI inverse problem can be obtained by back-projecting the data y via Φ^†, the adjoint of the measurement operator, as follows:

$\begin{eqnarray}&&{{\boldsymbol{x}}}_{{\rm{d}}}={\bf\mathsf{D}}{{\boldsymbol{x}}}^{\star }+{\bf\mathsf{b}},\end{eqnarray} \tag{ 1 }$

where ${{\boldsymbol{x}}}_{{\rm{d}}}=\kappa \mathrm{Re}\{{{\boldsymbol{\Phi }}}^{\dagger }{\boldsymbol{y}}\}\in {{\mathbb{R}}}^{N}$ is the normalized back-projected data, often termed the dirty image. The normalization factor κ > 0 ensures that the point-spread function (PSF; ${\bf\mathsf{h}}\,=\kappa \mathrm{Re}\{{{\boldsymbol{\Phi }}}^{\dagger }{\boldsymbol{\Phi }}\}{\boldsymbol{\delta }}\in {{\mathbb{R}}}^{N}$ , with ${\boldsymbol{\delta }}\in {{\mathbb{R}}}^{N}$ standing for the image with value one at its center and zero otherwise) has a peak value equal to one. The linear operator ${\bf\mathsf{D}}\triangleq \kappa \mathrm{Re}\{{{\boldsymbol{\Phi }}}^{\dagger }{\boldsymbol{\Phi }}\}\in {{\mathbb{R}}}^{N\times N}$ encodes Fourier degridding and gridding operations, and maps the image of interest to the dirty image space. Finally, the normalized back-projected noise reads ${\bf\mathsf{b}}=\kappa \mathrm{Re}\{{{\boldsymbol{\Phi }}}^{\dagger }{\boldsymbol{n}}\}\in {{\mathbb{R}}}^{N}$ .

3. The R2D2 Paradigm

3.1. Algorithmic Structure

The R2D2 algorithm (Aghabiglou et al. 2024) is underpinned by a sequence of DNN modules denoted by $\{{{\bf\mathsf{N}}}_{{\widehat{{\boldsymbol{\theta }}}}^{(i)}}\}{}_{1\leqslant i\leqslant I}$ , which are described by their learned parameters $\{{\widehat{{\boldsymbol{\theta }}}}^{(i)}\in {{\mathbb{R}}}^{Q}\}{}_{1\leqslant i\leqslant I}$ . With the image estimate being initialized as ${{\boldsymbol{x}}}^{(0)}={\bf{0}}\in {{\mathbb{R}}}^{N}$ , at any iteration i ∈ {1,...,I}, the network ${{\bf\mathsf{N}}}_{{\widehat{{\boldsymbol{\theta }}}}^{(i)}}$ takes as input both the previous image estimate x ⁽ⁱ⁻¹⁾ and associated residual dirty image r ⁽ⁱ⁻¹⁾. The latter is updated from the dirty image by removing the contribution of the previous image estimate: r ⁽ⁱ⁻¹⁾= x _d − D x ⁽ⁱ⁻¹⁾. The current image estimate is updated from the output of the network as

$\begin{eqnarray}&&{{\boldsymbol{x}}}^{(i)}={{\boldsymbol{x}}}^{(i-1)}+{{\bf\mathsf{N}}}_{{\widehat{{\boldsymbol{\theta }}}}^{(i)}}({{\boldsymbol{r}}}^{(i-1)},{{\boldsymbol{x}}}^{(i-1)}),\end{eqnarray} \tag{ 2 }$

where the predicted residual image ${{\bf\mathsf{N}}}_{{\widehat{{\boldsymbol{\theta }}}}^{(i)}}({{\boldsymbol{r}}}^{(i-1)},{{\boldsymbol{x}}}^{(i-1)})$ captures emission from the input residual dirty image r ⁽ⁱ⁻¹⁾, and corrects for estimation errors in x ⁽ⁱ⁻¹⁾, thus progressively enhancing the resolution and dynamic range of the reconstruction. Assuming I DNN modules in the R2D2 sequence, the final reconstruction $\widehat{{\boldsymbol{x}}}$ takes a simple series expression, as the sum of output residual images from all DNN modules. The iteration structure of R2D2 is reminiscent of MP, but its model components at each iteration are identified in a learned manifold of basis functions encoded by a DNN, rather than a handcrafted sparsity basis fixed across iterations.

3.2. Training Procedure

R2D2 DNN modules are trained in a sequential manner from a data set consisting of K image pairs of the ground-truth images and their associated dirty images $\{{{\boldsymbol{x}}}_{k}^{\star },{{{\boldsymbol{x}}}_{{\rm{d}}}}_{k}\}{}_{1\leqslant k\leqslant K}$ . At any iteration i ≥ 1, the network ${{\bf\mathsf{N}}}_{{\widehat{{\boldsymbol{\theta }}}}^{(i)}}$ is trained taking the previous image estimates and associated residual dirty images $\{{{\boldsymbol{x}}}_{k}^{(i-1)},{{\boldsymbol{r}}}_{k}^{(i-1)}\}{}_{1\leqslant k\leqslant K}$ as input, with $\{{{\boldsymbol{x}}}_{k}^{(0)}={\bf{0}}\}{}_{1\leqslant k\leqslant K}$ and $\{{{\boldsymbol{r}}}_{k}^{(0)}={{{\boldsymbol{x}}}_{{\rm{d}}}}_{k}\}{}_{1\leqslant k\leqslant K}$ . Its learned parameter vector ${\widehat{{\boldsymbol{\theta }}}}^{(i)}$ is minimizer of the loss function:

$\begin{eqnarray}&&\mathop{\min }\limits_{{{\boldsymbol{\theta }}}^{(i)}\in {{\mathbb{R}}}^{Q}}\displaystyle \sum _{k=1}^{K}\,\Vert {{\boldsymbol{x}}}_{k}^{\star }-{[{{\boldsymbol{x}}}_{k}^{(i-1)}+{{\bf\mathsf{N}}}_{{{\boldsymbol{\theta }}}^{(i)}}({{\boldsymbol{r}}}_{k}^{(i-1)},{{\boldsymbol{x}}}_{k}^{(i-1)})]}_{+}{\Vert }_{1},\end{eqnarray} \tag{ 3 }$

where ∥. ∥₁ stands for the ℓ₁-norm, and [.]₊ denotes the projection onto the positive orthant ${{\mathbb{R}}}_{+}^{N}$ , ensuring nonnegativity of the image estimate at any point in the iterative sequence. The image estimates and associated residual dirty images $\{{{\boldsymbol{x}}}_{k}^{(i)},{{\boldsymbol{r}}}_{k}^{(i)}\}{}_{1\leqslant k\leqslant K}$ are updated from the outputs of the trained network. Alongside their corresponding ground-truth images, they serve as the training data set for the subsequent DNN module. Training of the DNN series concludes when the reconstruction metrics on the validation data set reach a point of saturation.

3.3. R2D2 Variants

The R2D2 algorithm takes three variants. The first, simply referred to as R2D2 henceforth, adopts the well-established U-Net as the architecture of its DNN modules. The second, called R2D2-Net, is an unrolled version of R2D2 itself, trained end-to-end on GPUs. Its architecture consists of J U-Net layers interlaced with J − 1 data-consistency layers. The latter layers leverage a fast and memory-efficient PSF-based approximation of the operator D for the computation of residual images, which is instrumental to alleviate the scalability challenges encountered by unrolled deep-learning approaches. The third is a nested R2D2 architecture, where the U-Net modules of R2D2 are substituted with R2D2-Net. In reference to its nested structure, this variant is dubbed "Russian Dolls" R2D2, in short R3D3. In this landscape, the R2D2-Net variant, as DNN module to R3D3, also represents R3D3's first iteration.

4. When R2D2 Meets CLEAN

In exploring the joint landscape of R2D2 and CLEAN paradigms, we rely on three main features: (i) the iterative structure, either nested or nonnested; (ii) the data model used to update the residual dirty images, relying either on the exact measurement operator and nongridded visibilities, or a PSF-based approximation operating on gridded data; (iii) the model component basis, either handcrafted (identity or multiscale) or learned. In what follows, CLEAN and R2D2 variants are characterized and compared via the lens of this threefold categorization. Table 1 summarizes this landscape.

Table 1. Joint Landscape of R2D2 and CLEAN Paradigms

	Hö-CLEAN	R2D2-Net	R2D2	CS-CLEAN	MS-CLEAN	R3D3
Iterative structure	nonnested	nonnested	nonnested	nested	nested	nested

Data model	gridded	gridded	nongridded	nongridded	nongridded	nongridded

Component basis	identity	learned	learned	identity	multiscale	learned

Note. The use of italics enhances readability of the three-fold categorization.

Download table as: ASCII Typeset image

In the CLEAN paradigm, Hö-CLEAN exhibits a nonnested iterative structure, taking the dirty image as input. At each iteration, model components are identified in the identity basis from the residual dirty image. The latter is modeled as a convolution of the sought image with the PSF, representing an approximation to the data model. CS-CLEAN exhibits a nested iterative structure, also taking the dirty image as input. The minor cycle serves as a regularization step, with the image estimate updated with multiple model components in the identity basis. The major cycle ensures data fidelity through accurate updates of the residual dirty image, whereby the contribution of the current image estimate is removed from the nongridded visibilities using the exact measurement operator. Finally, MS-CLEAN can benefit from the same "major–minor" cycle structure as CS-CLEAN, while featuring a bespoke multiscale component basis in lieu of the identity basis.

In the R2D2 paradigm, all variants also take the dirty image as input. The unrolled R2D2-Net alternates between DNN modules, serving as regularization layers, and approximate PSF-based data-consistency layers. It thus closely mirrors the iteration structure of Hö-CLEAN, sharing a nonnested structure and a PSF-based data model. But, its model components are identified at each regularization layer in a learned manifold of basis functions encoded by the corresponding U-Net, rather than the identity basis. R2D2 itself can be interpreted as a learned version of a hybrid CLEAN at the interface of Hö-CLEAN, CS-CLEAN, and MS-CLEAN. It indeed features the exact data model of CS-CLEAN and MS-CLEAN for the update of the residual dirty images, but is underpinned by a nonnested structure, as is Hö-CLEAN. Its model components are identified at each iteration in a learned manifold of basis functions encoded by the corresponding U-Net module, rather than a handcrafted (identity or multiscale) basis. Finally, R3D3 exhibits a nested structure matching the major–minor cycle structure of CS-CLEAN and MS-CLEAN. Its R2D2-Net modules serve as its minor cycles, just as Hö-CLEAN (respectively a multiscale variant) serves as CS-CLEAN's (respectively MS-CLEAN) minor cycle structure. Its model components are identified at each iteration in the same learned manifold of basis functions as R2D2.

5. Application to Cygnus A

5.1. Observation and Imaging Settings

The Cygnus A data considered were acquired at S band (2.052 GHz) with the following observation setting. VLA configurations A and C were combined, with a single pointing centered at the inner core of Cygnus A, given by the coordinates R. A. = 19h59m28.356s (J2000) and $\mathrm{decl}.\,=\,+{40}^{{\rm{o}}}4^{\prime} 2\buildrel{\prime\prime}\over{.} 07$ . The total observation duration is about 19 hr conducted over multiple scans, respectively combining 7 and 12 hr with configurations A and C. The data are single-channel, acquired with integration time step of 2 s and channel width of 2 MHz. Careful calibration of the direction independent effects was performed in AIPS (Sebokolodi et al. 2020), following which, the data were averaged over 10 s totaling M = 9.6 × 10⁵ points. Dabbech et al. (2021) showed that these data could also benefit from the calibration of DDEs, likely attributed to pointing errors at the hotspots. However, DDE solutions from the underpinning joint calibration and imaging framework (Repetti et al. 2017) are not considered in this study as none of the benchmark algorithms aside from uSARA and AIRI could benefit from them.

As far as the imaging setting is considered, we target forming an image of size N = 512 × 512, with a pixel size of about 0 farcs 29, corresponding to a superresolution factor of 1.5, from Briggs-weighted data with Briggs parameter set to 0 (weights are generated in WSClean; Offringa & Smirnov 2017). The target dynamic range of the reconstruction is about 1.7 × 10⁵, and is inferred from the data as the ratio between the peak pixel value of the sought image and the image-domain noise level estimated as η_Briggs $\tau /\sqrt{2\parallel {{\boldsymbol{\Phi }}}^{\dagger }{\boldsymbol{\Phi }}{\parallel }_{S}}$ , with η_Briggs > 0 being a correction factor accounting for the Briggs weights (Terris et al. 2022; Wilber et al. 2023a), and ∥. ∥_S denoting the spectral norm of its argument operator.

5.2. R2D2 Incarnations and Implementations

We have borrowed the first incarnations of the R2D2 variants from Aghabiglou et al. (2024). These were trained in a telescope-specific approach, more precisely for VLA. R2D2 and R3D3 were respectively trained with I = 15 U-Net modules, and I = 8 R2D2-Net modules, all sharing the same architecture composed of J = 6 U-Net layers. The R2D2-Net incarnation considered is exactly the first DNN module of the R3D3 incarnation.

The training data set was composed of 20,000 pairs of ground-truth images and associated dirty images of size N = 512 × 512. Synthetic ground-truth images, endowed with high dynamic ranges in the interval [10³, 5 × 10⁵], were created from a low-dynamic-range database of optical astronomy and medical images via denoising and exponentiation procedures (Terris et al. 2022, 2023). Targeting a robust incarnation, usable across a wide landscape of VLA observation settings, data were created from a variety of Fourier sampling patterns generated using MeqTrees (Noordam & Smirnov 2010), combining configurations A and C. The sampling patterns were generated by sampling uniformly at random: (i) the pointing direction; (ii) the temporal parameters (total observation duration assuming a single scan and a constant time step of 36 s for a combined duration at both configurations ranging between 6 and 13 hr); (iii) the spectral parameters (number of consecutive frequency channels and frequency channel bandwidth) under the assumption of flat spectra of the radio emission. Additionally, random flagging of up to 20% of the Fourier samples was applied. The simulated visibilities of size M ∈ [0.2, 2] × 10⁶ were corrupted with noise levels commensurate with the dynamic ranges of the corresponding ground-truth images. In order to contain the number of varying parameters in the training, the imaging setting was fixed, with a N = 512 × 512 image size, and dirty images obtained: (i) using the same weighting scheme across all data, that is Briggs weighting whose parameter is set to 0; (ii) with a pixel size ensuring a superresolution factor of 1.5.

We note that, while the imaging setting for the pretrained R2D2 variants and the target image reconstruction of Cygnus A are the same, the VLA temporal parameters of the observation settings (total duration, integration time step, number of scans) belong to quite different categories. Our working hypothesis in using these pretrained variants is that the wide variety of observation settings visited at training stage endows the DNN modules with the necessary robustness, when interlaced with the update of the residual dirty image at each iteration, to generalize across different categories of observation settings. The same question in fact formally holds with regards to the target category of images, as the training data set contains no radio image whatsoever. The capability to train on synthetic data and reconstruct radio images was extensively validated in simulation (Aghabiglou et al. 2024), but remains to be demonstrated on real data.

Python and MATLAB implementations of the R2D2 variants that can be executed on flexible hardware are available as part of the BASPLib code library. Extensive validation of image reconstruction in simulation, as well as a detailed analysis of their computational performance at both training and imaging stages are provided in Aghabiglou et al. (2024). In this study, image reconstruction with R2D2 variants was conducted using their Python implementation, executed on a single GPU.

5.3. Benchmark Algorithms

The performance of the three R2D2 variants is studied in comparison with the Hö-CLEAN, CS-CLEAN, MS-CLEAN, uSARA, and AIRI. Parameters of CLEAN variants were adjusted to ensure a good compromise between speed and data fidelity, while achieving a good reconstruction quality. Both uSARA and AIRI are shipped with automated noise-driven heuristics for the choice of their regularization parameter (Terris et al. 2022; Wilber et al. 2023a). However, an adjustment within 1 order of magnitude from their heuristic values appeared to be necessary for best results. For CLEAN variants, we have considered the widely used C++ software WSClean (Offringa & Smirnov 2017; see the Appendix for full commands). Both uSARA and AIRI are implemented in MATLAB as part of the BASPLib code library. All CLEAN variants and uSARA were executed on a single CPU core. AIRI was executed on a single CPU core for its data-fidelity steps and single GPU for its regularization steps (denoising DNNs).

5.4. Imaging Results

Cygnus A images obtained by the different imaging methods are displayed in ${\mathrm{log}}_{10}$ scale in Figures 1–4. Reconstructions are overlaid with additional panels consisting of (a) the associated residual dirty images displayed in linear scale to visually assess the fidelity to back-projected data, and zooms on selected regions of the radio galaxy, all displayed in ${\mathrm{log}}_{10}$ scale: (b) the inner core, (c) the west hotspots, and (d) the east hotspots. The overall visual inspection of Cygnus A reconstructions shows that R2D2 variants exhibit higher resolution than CLEAN variants, while generally corroborating the achieved depictions by AIRI and uSARA. They provide deep reconstructions, whose pixel values span nearly 5 orders of magnitude, which is in line with the target dynamic range estimate. A close-up inspection indicates that both R3D3 (Figure 3, bottom) and AIRI (Figure 4, bottom) stand out, owing to their high levels of detail and their limited amount of patterns that could be construed as artifacts. R2D2 (Figure 2, bottom) and R2D2-Net (Figure 3, top) seem to lack details in the faint extended emission. uSARA (Figure 4, top) depicts spurious ringing and wavelet-like patterns. As expected, Hö-CLEAN (Figure 1, top) delivers a poor reconstruction with severely limited dynamic range, due to its inherent approximate data model. Both CS-CLEAN (Figure 1, bottom) and MS-CLEAN (Figure 2, top) provide much improved reconstructions, with the former exhibiting grid-like artifacts due to its inadequate sparsity (identity) basis for the complex target radio source.

**Figure 1.** Cygnus A: reconstructions of Hö-CLEAN (top) and CS-CLEAN (bottom), displayed in ${\mathrm{log}}_{10}$ scale (negative pixels set to zero for visualization purposes). Their associated residual dirty images are provided in linear scale in panels (a), with standard deviation values 359 × 10⁻⁴, and 8.6 × 10⁻⁴, respectively. Reconstructions are overlaid by zooms on key regions of the radio galaxy: the inner core of Cygnus A in panels (b), the east hotspots in panels (c), and the west hotspots in panels (d), all displayed in ${\mathrm{log}}_{10}$ scale. Reconstructions are available in FITS format in the data set Dabbech et al. (2024).
Download figure:
Standard image High-resolution image

**Figure 2.** Cygnus A: reconstructions of MS-CLEAN (negative pixels set to zero for visualization purposes; top) and R2D2 (bottom), both displayed in ${\mathrm{log}}_{10}$ scale. Their associated residual dirty images are provided in linear scale in panels (a), with standard deviation values 10.4 × 10⁻⁴, and 11.7 × 10⁻⁴, respectively. Reconstructions are overlaid by zooms on key regions of the radio galaxy: the inner core of Cygnus A in panels (b), the east hotspots in panels (c), and the west hotspots in panels (d), all displayed in ${\mathrm{log}}_{10}$ scale. Reconstructions are available in FITS format in the data set Dabbech et al. (2024).
Download figure:
Standard image High-resolution image

**Figure 3.** Cygnus A: reconstructions of R2D2-Net (also the first iteration of R3D3; top), and R3D3 (bottom), both displayed in ${\mathrm{log}}_{10}$ scale. Their associated residual dirty images are provided in linear scale in panels (a), with standard deviation values 13.4 × 10⁻⁴, and 9.7 × 10⁻⁴, respectively. Reconstructions are overlaid by zooms on key regions of the radio galaxy: the inner core of Cygnus A in panels (b), the east hotspots in panels (c), and the west hotspots in panels (d), all displayed in ${\mathrm{log}}_{10}$ scale. Reconstructions are available in FITS format in the data set Dabbech et al. (2024).
Download figure:
Standard image High-resolution image

**Figure 4.** Cygnus A: reconstructions of uSARA (top) and AIRI (bottom), both displayed in ${\mathrm{log}}_{10}$ scale. Their associated residual dirty images are provided in linear scale in panels (a), with standard deviation values 7.2 × 10⁻⁴, and 7.4 × 10⁻⁴, respectively. Reconstructions are overlaid by zooms on key regions of the radio galaxy: the inner core of Cygnus A in panels (b), the east hotspots in panels (c), and the west hotspots in panels (d), all displayed in ${\mathrm{log}}_{10}$ scale. Reconstructions are available in FITS format in the data set Dabbech et al. (2024).
Download figure:
Standard image High-resolution image

The inner core. The inner core consists of the point-like active galactic nucleus (AGN) of Cygnus A, from which two jets emanate (panels (b) of all figures). The reconstructions of R2D2 variants, uSARA, and AIRI show a superresolved depiction of the region. In particular, R2D2 variants exhibit continuous emission between the AGN and both jets. uSARA exhibits wavelet-like artifacts around the AGN. CLEAN variants provide unresolved depiction of the source due to the restoring beam.

The lobes. Examination of the west and east lobes of Cygnus A highlights the ability of R2D2 variants to provide a more physical depiction of their filamentary structure than the benchmark algorithms. On the one hand, CLEAN variants deliver a smooth reconstruction. On the other hand, uSARA, and to a much lesser extent AIRI, exhibit ringing artifacts in the west lobe (pointed at with a green arrows in Figure 4). These artifacts are likely induced by pointing errors at the hotspots, resulting in the overfitting of the high-spatial-frequency content of the data by both uSARA and AIRI. Joint DDE calibration and imaging (using either AIRI or uSARA as the imaging module) can drastically reduce (if not remove) these artifacts (see their corresponding reconstructions provided in Dabbech et al. 2024). These findings suggest that R2D2 variants may be less prone to calibration errors than AIRI and uSARA.

Focusing on the faint diffuse emission at the tails of the lobes, both R3D3 and AIRI provide consistent depiction. When flipping between their associated figures (Figures 3–4, bottom), the R3D3 reconstruction appears sharper. One such example is the faint filamentary structure at the tail of the east lobe. However, one cannot ascertain whether this sharpness is physical, particularly in absence of DDE calibration. Both R2D2 (Figure 2, bottom) and R2D2-Net (Figure 3, top) provide a rather smooth structure. uSARA introduces wavelet-like patterns (pointed at with red arrows in Figure 4, top). The faint emission under scrutiny is completely buried in the noise in the Hö-CLEAN reconstruction (Figure 1, top). A blurry and noisy depiction emerges in the reconstructions of both CS-CLEAN (Figure 1, bottom) and MS-CLEAN (Figure 2, top).

The hotspots. As recovered by R2D2 variants, AIRI and uSARA, the hotspots highlight the ability of these algorithms to resolve physical structure beyond instrumental resolution, in contrast with CLEAN variants (see panels (c) and (d) of all figures). Interestingly, where AIRI and uSARA exhibit artificial zero-valued pixels around the hotspots, all R2D2 variants depict continuous emission. This observation suggests the ability of R2D2 variants to achieve a more physical reconstruction.

Image-domain data fidelity. We evaluate data fidelity delivered by the imaging algorithms by scrutinizing their residual dirty images (i.e., back-projected data residual) displayed in panels (a) of all figures, whose standard deviation values are reported in the corresponding captions. AIRI, uSARA, and CS-CLEAN obtain residual dirty images with the lowest standard deviation values, reflecting their high data fidelity. Both MS-CLEAN and R2D2 variants provide comparable values, slightly above the best-performing algorithms. Among R2D2 variants, R3D3 performs better than both R2D2 and R2D2-Net, owing to its underpinning model-informed DNN modules on the one hand, and its iterative structure on the other hand. Hö-CLEAN delivers the lowest fidelity with its standard deviation value being more than 1 order of magnitude higher than the rest. This numerical analysis aligns with the overall visual examination. A closer inspection of the R2D2 variants reveals discernible structure at the pixel positions of the western and eastern hotspots, where the highest emission is concentrated. While similar behavior has been observed in high-dynamic-acquisition regimes in simulation (Aghabiglou et al. 2024), these patterns are possibly amplified by the pointing errors at the hotspots.

It is worth noting that the quest for noise-like data residuals is only really meaningful in combination with the recovery of a model satisfying physical constraints. The fact that CLEAN variants accept negative components eases the reduction of their data residuals. On the contrary, R2D2 variants, uSARA, and AIRI explicitly enforce the positivity of the recovered intensity images.

Computational cost. Computational details of the imaging algorithms are provided in Table 2. R2D2 variants are able to deliver reconstructions in a few seconds only. We note that reported times include a computational overhead for initializing DNN modules at the first iteration, inducing a large difference between the average time of the regularization step between R2D2-Net on one hand, and R2D2 and R3D3 on the other. Both CS-CLEAN and MS-CLEAN require around half a minute, whereas the highly iterative algorithms AIRI and uSARA take about 10 and 20 minutes, respectively. When compared to CS-CLEAN and MS-CLEAN, both R2D2 and R3D3 perform comparable number of passes through the nongridded visibilities. Their DNN modules provide a boost to their speed in comparison with the minor cycles of the advanced CLEAN variants. R2D2-Net is also at least 1 order of magnitude faster than its CLEAN counterpart Hö-CLEAN. However, one must recognize the differences in implementation and hardware between the imaging algorithms, reflected in the varying average time of their data-fidelity step (all calling for the computation of the residual dirty images).

Table 2. Imaging Computational Details

	I	n_cpu	n_gpu	t_tot.	t_dat.	t_reg.
				(s)	(s)	(s)
Hö-CLEAN	1	1	⋯	21.87	⋯	21.87
CS-CLEAN	7	1	⋯	33.69	2.38	2.43
MS-CLEAN	9	1	⋯	39.01	2.50	1.83

R2D2	15	⋯	1	3.31	0.13	0.10
R2D2-Net	1	⋯	1	0.97	⋯	0.97
R3D3	8	⋯	1	1.83	0.06	0.18

uSARA	477	1	⋯	1197	0.64	1.83
AIRI	1783	1	1	672	0.32	0.05

Note. I is the number of iterations; n_cpu and n_gpu correspond to the number of allocated CPU cores and GPUs, respectively; t_tot is the total imaging time; t_dat. and t_reg. correspond to the average time per iteration of the data-fidelity step and the regularization step, respectively. The time to compute the dirty image is excluded. For CLEAN variants, I is the number of iterations in the major cycle, t_dat. is the average time to update the residual dirty image, and t_reg. is the average time taken by a minor cycle.

Download table as: ASCII Typeset image

Of note, the training of R2D2 variants required the order of thousands of CPU core hours and GPU hours (Aghabiglou et al. 2024). This is, however, a one-off cost accommodating any number of subsequent reconstructions from VLA data, as exemplified by the fact that the present analysis did not require any training.

6. Conclusions

We have shed light on R2D2's algorithmic structure, interpreting it as a learned version of CLEAN. The R2D2, R2D2-Net, and R3D3 variants have been cautiously positioned in the landscape of Hö-CLEAN, CS-CLEAN, and MS-CLEAN. In this context, we have provided the first demonstration of R2D2 on real data, for monochromatic intensity imaging of the radio galaxy Cygnus A from S-band observations with the VLA. R2D2 variants deliver superior imaging quality to CLEAN's, while offering some acceleration potential owing to the substitution of minor cycles with DNNs. In comparison to uSARA and AIRI, R2D2 variants at least equate their imaging precision, at a fraction of the cost. Not to mention that the regularization parameter of uSARA and AIRI has been adjusted manually, while R2D2 runs in an automated manner. These results confirm that R2D2's learned approach offers an immense potential for fast precision RI imaging, already generalizing across categories of images and observations settings.

Future research includes investigating the robustness of the R2D2 paradigm to accommodate a much wider variety of observation and imaging settings, ranging from varying data-weighting schemes, to enabling flexible superresolution factors and image dimensions, and ultimately an all-instrument-encompassing incarnation. Further developments toward improving its capability to deliver new regimes of precision are warranted, e.g., by leveraging novel DNN architectures (including diffusion models; Ho et al. 2020) and training loss functions for more efficient and physics-informed training and reconstruction. Finally, endowing the R2D2 paradigm with an image-splitting procedure, as implemented in DDFacet (Tasse et al. 2018), WSClean (Offringa & Smirnov 2017), and Faceted HyperSARA (Thouvenin et al. 2023), is a necessary evolution to efficiently handle large image sizes. In a nutshell, not only has CLEAN been leading the way for decades, but its algorithmic structure might well form the backbone of the next-generation deep learned-based imaging algorithms for radio astronomy.

Acknowledgments

The authors warmly thank R.A. Perley (NRAO) for providing the VLA observations of Cygnus A. The research of A.A., A.D., and Y.W. was supported by the UK Research and Innovation under the EPSRC grant EP/T028270/1 and the STFC grant ST/W000970/1. The research was conducted using Cirrus, a UK National Tier-2 HPC Service at EPCC funded by the University of Edinburgh and EPSRC (EP/P020267/1). The authors thank Adrian Jackson for related support. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.

Facility: VLA - Very Large Array

Software: WSClean (Offringa & Smirnov 2017); Meqtrees (Noordam & Smirnov 2010); BASPLib (https://basp-group.github.io/BASPLib/).

Data Availability

R2D2 codes are available alongside the AIRI and uSARA codes in the BASPLib¹ code library on GitHub. BASPLib is developed and maintained by the Biomedical and Astronomical Signal Processing Laboratory (BASP² ). The VLA-trained R2D2 and R3D3 DNN series are available in the data set Aghabiglou et al. (2024). Cygnus A reconstructions are available in FITS format in the data set Dabbech et al. (2024). Observations of Cygnus A were provided by the National Radio Astronomy Observatory (NRAO; Program code: 14B-336). The self-calibrated data can be shared upon request to R.A. Perley (NRAO).

Appendix: WSClean Commands

Commands used in WSClean to generate the reconstructions of the CLEAN variants.

MS-CLEAN: wsclean -size 512 512 -scale 0.29170266asec -weight briggs 0 -mgain 0.8 -gain 0.1 -auto-threshold .5 -auto-mask 1.5 -multiscale -padding 2 -niter 2000000 -j 1

CS-CLEAN: wsclean -size 512 512 -scale 0.29170266asec -weight briggs 0 -mgain 0.8 -gain 0.1 -auto-threshold .5 -auto-mask 1.5 -padding 2 -niter 2000000 -j 1

Hö-CLEAN: wsclean -size 1024 1024 -scale 0.29170266asec -weight briggs 0 -mgain 1 -gain 0.1 -threshold 0.001 Jy -auto-mask 1 -padding 2 -nmiter 1 -niter 500000 -j 1

CLEANing Cygnus A Deep and Fast with R2D2

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. RI Inverse Problem