Pipelines and Archives

2012-10-26

Inside-out map-making

If the time-series data supplied to SMURF:MAKEMAP does not consist of a single continuous sequence of time samples, it will divide the data up into two or more "chunks" of continuous samples. A separate map is made from each chunk in turn, using the iterative procedure described in SUN/258, and these maps are simply co-added at the end to create the final returned map. This procedure can be summarised as follows:

initialise final map to zero

for each chunk:

initialise chunk map to zero

for iteration = 1 to numiter

update chunk map using time-series data in current chunk

next iteration

finalise chunk map

add the chunk map into the total map

next chunk

In general, it seems that the more data that is included in each iteration, the more constrained is the map, and so the fewer the spurious features. So it would be nice if we could turn these loops "inside out" so that the iteration loop occurred outside the chunk loop, causing data from all chunks to be included in the map at the end of each iteration.

initialise final map to zero

for iteration = 1 to numiter:

for each chunk:

update final map using time-series data in current chunk

next chunk

next iteration

finalise final map

As an aid to experiment, I have added some simple features to MAKEMAP to allow something like this effect to be achieved, at the expense of considerably longer run-times and more effort on the part of the user. If experiments with these facilities show that there is a significant benefit in turning the loops "inside-out", then it may be possible to implement the idea in a more time-efficient manner.

The importsky configuration parameter:

The main feature in the new scheme is the facility to supply an initial estimate of the sky when running MAKEMAP. This is done by specifying

importsky=ref

in the config file, and then assigning the NDF containing the initial sky values to the "REF" ADAM parameter on the MAKEMAP command line:

% makemap ref=mysky.sdf

Before the first iteration starts, the specified sky map is sampled at the position of each bolometer value, and each sampled sky value is subtracted from the corresponding cleaned bolometer value. The remaining residuals are then processed as normal by estimating and subtracting (typically) COM and FLT models. The values sampled from the sky map are then added back onto the residuals, which are binned into a new map - the first "itermap". From then on, the algorithm proceeds to do the second and subsequent iterations in the usual manner.

This allows us to produce the "inside-out" looping described above by repeatedly re-running MAKEMAP, doing only one iteration on each run, and supplying the output from one invocation as the initial sky map for the next invocation. In effect, you are doing the "iteration" loop by hand at the shell level.

Speeding things up:

This is slow. But with patience some experimentation, inside-out looping can be performed. It can be made slightly faster by caching some of the processing performed by the first invocation for use by subsequent invocations. Two things can be done:

Caching the cleaned bolometer values. If "exportclean=1" is included in the config file on the first invocation of MAKEMAP, the cleaned time-series data will be written to disk in NDFs that end with the suffix "_cln.sdf". These NDFs can be used in place of the raw data as input for all subsequent invocations of MAKEMAP. In this case you should create a new config file for the second and subsequent invocations and add "doclean=0" to it, so that the cleaned input data will not be re-cleaned.
Caching the EXT model values. Calculation of the EXT model values happens only once, before the first iteration starts. So if you add "exportndf=ext" to the config file for the first invocation, a set of NDFs with suffix "_ext.sdf" will be created holding the EXT model values. These can be supplied as input to the second and subsequent invocations by adding "ext.import=1" to the config file. Note, MAKEMAP uses fixed pre-defined names for the NDFs when writing and reading EXT model values, so the NDFs created on the first invocation should not be moved or re-named. We also add "noexportsetbad=1" to prevent EXT values for flagged bolometers being set bad in the exported NDFs (since a different set of bolometers may potentially be flagged as bad on subsequent invocations).

There are other possibilities for speeding things up, such as caching the LUT model, but these require changes to the code, which have not yet been implemented.

Other parameter settings:

Setting "numiter=1" is required to ensure only one iteration is performed by each invocation of MAKEMAP.
The NOI model, which contains estimates of the noise in each bolometer time stream, is normally calculated at the end of the first iteration, once the final residuals are known. This is no good for us here since each invocation only performs one iteration. The simplest solution is to add "noi.calcfirst=1" to the configuration for every invocation. This forces the noise estimates to be made before the start of the first iteration.
Care needs to be taken if any AST masking is used. Firstly, since we have set "numiter=1", the first iteration is also the last iteration. So we need to add "ast.zero_notlast=0" to the configuration for every invocation. Without this, no masking would be performed on any of the invocations of MAKEMAP. However, we do want to suppress masking on the final invocation, and so we should revert to the default value of "ast.zero_notlast=1" for the final invocation.
If an external mask is to be used, it should be supplied as normal as the REF parameter on the first invocation. The output map from the first invocation will have this same mask and so will mask the AST model correctly when supplied as the REF parameter on the second and subsequent invocations.

An example:

In this example, four invocations are performed. How to choose the best number of invocations is still to be determined. Continuing until the RMS change within the source regions between subsequent maps is less than some target figure may be the way to go.

First invocation:

% makemap in=/SCUBA2/s8\*/20111114/00055/\* ref=mymask config=^conf1 out=map1

We make an initial map from the raw 850 um data files for 20111114 obs. 55 using "mymask.sdf" as an external mask. The output map is put into "map1.sdf". The configuration file "conf1" contains:

^/star/share/smurf/dimmconfig_bright_extended.lis

numiter=1 # Only do one iteration

850.flt.filt_edge_largescale=600

450.flt.filt_edge_largescale=600

ast.zero_mask=1

ast.zero_notlast = 0 # Mask the final (i.e. first) iteration

noi.calcfirst=1 # Calculate NOI before iterating

exportNDF=ext # Export the EXT model values to NDFs
noexportsetbad=1 # Export good EXT values for bad bolometers

exportclean=1 # Export the cleaned time-series data to NDFs

Second invocation:

% makemap in=./\*_cln.sdf ref=map1 config=^conf2 out=map2

We make a new map from the cleaned data files "*_cln.sdf" created by the first invocation (we assume the current directory contains no other cleaned data, for instance from an earlier run). The map created by the first invocation is supplied as the REF image, and the new map is put in "map2.sdf". The "conf2" file sets the importsky parameter to indicate that the initial sky supplied by the REF image should be subtracted from the time-series before starting the first (and only) iteration.

^conf1 # Inherit all the settings in "conf1"

exportNDF=0 # Do not export the EXT model this time

exportclean=0 # Do not export the cleaned data this time

doclean=0 # No need to clean the data this time

importsky=ref # Subtract off the REF values at the start

ext.import=1 # Import the EXT values created on the first invocation

Third invocation:

% makemap in=./\*_cln.sdf ref=map2 config=^conf2 out=map3

Later invocations use the map created by the previous invocation as REF, and use the same config file as the second invocation.

Last (=fourth) invocation:

% makemap in=./\*_cln.sdf ref=map3 config=^conf3 out=final-map

On the last invocation we need to prevent the mask being applied, so we use a new config:

^conf2 # Inherit all the settings in "conf2"

ast.zero_notlast = 1 # Do not mask the final (i.e. first) iteration

2012-10-19

Using Multiple Masks

Today I made some changes to the way that MAKEMAP handles masks for the AST, FLT and COM models. Previously, only one mask could be used with each model. So for instance if the following configuration was used:

ast.zero_mask = 1
ast.zero_lowhits = 0.1

the AST.ZERO_MASK setting would take precedence over the AST.ZERO_LOWHITS setting, resulting in the external mask supplied via the REF parameter being used, and the AST.ZERO_LOWHITS setting ignored.

The current behaviour is now to combine such masks together. So in the above example the external mask and the lowhits mask would be combined to create a single mask. The combination is done in a manner determined by the AST.ZERO_UNION parameter. If this parameter is true (i.e. non-zero), a pixel in the combined mask is considered a "source" pixel if it is flagged as a source pixel in any of the individual masks (i.e. the combined source area is the union of the individual source areas). If AST.ZERO_UNION is false (i.e. zero) , a pixel in the combined mask is considered a "source" pixel if it is flagged as a source pixel in all of the individual masks (i.e. the combined source area is the intersection of the individual source areas). The default for AST.ZERO_UNION is 1 - that is, the union of the masks is used by default.

In addition, a different external masks can now be used with each model. This is achieved by using a new syntax for the XXX.ZERO_MASK configuration parameters. Each of these configuration parameters can now be given the name of the ADAM parameter which is to be used to get the corresponding mask. [Just to remind you, "ADAM" parameters are different to configuration parameters - all the configuration parameters are obtained using a single ADAM parameter called CONFIG.] External masks can be specified using any of the ADAM parameters "REF", "MASK2" and "MASK3" ("REF" serves the purpose of "MASK1"). The REF parameter has been around for a long time, but the MASK2 and MASK3 parameters are new.

So for instance, to use different external masks for the AST and FLT models, you could do:

ast.zero_mask = REF
flt.zero_mask = MASK2

and then run makemap as

% makemap ref=ast_mask.sdf mask2=flt_mask.sdf

Or to use the same mask for both models, you should do:

ast.zero_mask = REF
flt.zero_mask = REF

% makemap ref=mask.sdf

The old XXX.ZERO_MASK syntax is still supported. Supplying a positive integer value causes the mask to be accessed using the REF parameter, as before. The default for each XXX.ZERO_MASK parameter is still zero, meaning that no external mask is used.

One twist to beware of with the new scheme, is that the REF parameter is treated somewhat differently to the MASK2 and MASK3 parameters. The REF parameter is still used, as it always has been, to define the pixel grid of the output map. This means that any mask supplied via the REF parameter will, by definition, be aligned with the output map. The same is not true of the MASK2 and MASK3 parameters. If using these parameters, you must ensure that the mask NDFs are aligned in pixel coordinates with the output map. The easiest way to do this is to create your mask from the output map of a previous similar run of makemap.

2012-09-21

SCUBA-2 Data Reduction Manual

A new cookbook, sc21, is available describing the reduction of SCUBA-2 data. This cookbook focusses on the Dynamic Iterative Map-Maker (or map-maker), which performs the processing of raw data into a science grade map, along with common post-processing steps.

http://www.starlink.ac.uk/docs/sc21.htx/sc21.html

An explanation of how the map-maker works is given, including details of the default parameters which control each stage. Specialised configuration files which alter these parameters to suit different science goals are also introduced.

The cookbook guides users through all the post-processing options, from cropping and co-adding maps using PICARD, to applying an FCF and calculating the noise. The science pipeline is also discussed, with instructions for running it on a local machine.

Chapter List:
1. Introduction
2. SCUBA-2 Overview
3. Raw SCUBA-2 Data
4. The Dynamic Iterative Map-Maker
5. Reducing your Data
6. Tweaking the Configuration File
7. Examples of Different Reductions
8. The SCUBA-2 Pipeline
9. SCUBA-2 Data Calibration

Processing CLS data with ORAC-DR

A dedicated recipe has been written to process data from the Cosmology Legacy Survey using a "jack-knife" method of estimating and removing residual large-scale noise. The recipe is based on a script written by Ed for S2SRO data (and used to produce the figures in the SMURF paper). In this article I will describe how to use the recipe, what steps it performs and what parameters may be used to control the processing. In the future, this technique may be made more widely available for applying to other data sets (not just blank field ones).

How do I use it?

To use this recipe, initialize the pipeline, create a text file with the names of all the files to process and a file with recipe parameters if necessary. Then add the recipe name REDUCE_CLS to the usual command line:


% oracdr -loop file -files filenames.lis -recpars params.ini 
  -log sf -nodisplay REDUCE_CLS

By default the recipe uses the "blank field" makemap config file (though this may be overridden with the MAKEMAP_CONFIG recipe parameter; see below). The recipe creates a slew of output files with the following suffices:

_fmos - signal map (1 for each observation)
_mappsf - map-filtered PSF (1 for each observation). This is the same as the above signal map but with an artifical gaussian added to the time-series (and located at the map centre).
_wmos - coadded signal map
_wpsf - coadded map-filtered PSF
_jkmap - jack-knife map
_whiten - whitened signal map
_whitepsf - whitened, map-filtered PSF map
_cal - calibrated, whitened signal map
_mf - above map processed with matched-filter (using the _whitepsf map as the PSF)
_snr - signal-to-noise map created from above map (_mf)

Note that the per-observation files begin with s, and the coadds (and subsequent products) begin with gs. The ones you are probably most interested in are the _mf and _snr files.

What does it do?

The recipe works as follows.

Each observation is processed separately to produce a signal map (_fmos). In addition, each observation is re-processed with an artificial gaussian source added at the map centre (this will be used to create the "map-filtered PSF" image, _mappsf).
The signal maps are combined using inverse-variance weighting to create a coadded signal map (_wmos). The images with the artificial gaussians added in are also coadded to produce the "map-filtered PSF" (_wpsf).
The data are split into two groups made from alternating observations. Each group is coadded, and the two coadds are subtracted to produce the jack-knife map (_jkmap).
The SMURF command sc2filtermap is used to estimate the radial angular power spectrum (circular symmetry is assumed) within a region defined by twice the minimum noise in the coadded signal map. (The size of this region is written to the FITS header of the output map under the keyword WHITEBOX.)
The inverse of this angular power spectrum is applied to the coadded signal and PSF images (_whiten and _whitepsf) to remove residual low spatial-frequency noise.
The amplitude of the fake source in _whitepsf is compared with the input value to derive a corrected FCF. This new FCF is used to calibrate the whitened signal map to create the _cal map.
A matched filter is applied to _cal using _whitepsf as the PSF to create _mf.
Finally, a signal-to-noise ratio image is created (_snr).

What parameters are available?

The following recipe parameters can be used to control the processing:

FAKEMAP_SCALE - Amplitude of the fake source (in Jy) added to the timeseries to assess the map-making response to a point source.
MAKEMAP_CONFIG - Name of a config file for use with the SMURF makemap task. The file must exist in the current working directory, $MAKEMAP_CONFIG_DIR, $ORAC_DATA_OUT, $ORAC_DATA_CAL or $STARLINK_DIR/share/smurf.
MAKEMAP_PIXSIZE - Pixel size in arcsec for the output map. Default is wavelength dependent (4 arcsec at 850 um, 2 arcsec at 450 um).
WHITEN_BOX - Size of the region used to calculate the angular power spectrum for removing residual low-frequency noise in the data. Default is a square region bounded by the noise being less than twice the minimum value.

What if I want to run it again?

In addition to the full pipeline recipe, there is a PICARD recipe called SCUBA2_JACKKNIFE which performs all of the post-map-making steps. This can be used to examine the influence of including or omitting individual observations (say ones with visible artefacts that the pipeline is not able to trap), or investigate the effect of varying the size of the whitening region, or trim the images to a specific size before re-running the jack-knife steps.

Note that the pipeline must have been run once to produce all the necessary files which go in to this recipe. And note that the full pipeline recipe must be run again if a different config file or input gaussian amplitude is to be used.

The majority of the flexibility in controlling the processing occurs after the individual signal maps have been created. As a refresher, running this recipe would mean typing:


% picard -log sf -recpars params.ini SCUBA2_JACKKNIFE myfiles*.sdf

All of the control is through the recipe parameters in the file params.ini. The most important item to note is that you must provide a map-filtered PSF (otherwise a default will be used). However, you already have one of these: the file ending _mappsf.sdf listed above.

PSF_MATCHFILTER - the name of the map-filtered PSF file
WHITEN_BOX - Size of the region used to calculate the angular power spectrum for removing residual low-frequency noise in the data. Default is a square region bounded by the noise being less than twice the minimum value.

This is only the first version of the recipe, and its limitations have yet to be determined. However, one case I have found where the recipe fails is in a large, multi-field map with highly-variable noise. In this case it might be better to process each field separately before creating the mosaic (though this has the disadvantage that different filtering may be applied to each field). Try it out and see what you get. All feedback welcome.

2012-09-12

FIT1D: A new SMURF command for ACSIS data

More time ago than I am willing to admit, I started coding a Starlink routine to fit spectral lines in ACSIS cubes. Got a long way until SCUBA-2 commissioning and calibration put a halt to it, but I have finally managed to finish the program, technically a beta-version, as part of the upcoming Kapuahi release of Starlink.

Usage:

% fit1d in out rms [config] [userval] [pardir] [parndf] [parcomp]

What distinguishes FIT1D from other profile fitting routines is that it specifically attempts to deal with two issues: non-gaussian profile shapes and the fact that ACSIS data-cubes have many, potentially very different, profiles to fit. Regarding the latter, there are many fitting routines that produce fitted profiles for data-cubes, but FIT1D also produces cubes with the fitted parameters themselves and can use such files as input to give control over the fit of, in principle, each individual profile in the cube. Thus it is e.g. possible to fit broad lines on the nucleus of a galaxy and narrow lines everywhere else. More about that below.

A section has been added to the SMURF documentation in SUN/258 about FIT1D, which I will try to summarize here. FIT1D is generic in that it can fit profiles along any axis of an up to 7-dim hyper-cube, but will be discussed here in the context of a default RA-Dec-Vel ACSIS cube. Note that the routine assumes that data have been baseline-subtracted, using e.g. MFITTREND, i.e. that the profiles have a zero-level at 0.

Gauss-Hermite shapes as a function of the 3rd-order
      skewness coefficient 'h3' and the 4th-order the kurtosis (peakiness)
      coefficient 'h4'. The red box indicates the limits on acceptable
      values for h3 and h4 as defined in the defaults configuration file. Note
      that the fitted profile by default is restricted to positive values
      and will omit the shown negative features.

Non-gaussian Profiles.

FIT1D essentially re-implements the fitting code for non-gaussian profiles from the GIPSY package (Kapteyn Institute, Groningen, The Netherlands). Function types that can be fitted are Gaussian, Gauss-Hermite, and Voigt profiles. In particular, Gauss-Hermite functions are a powerful extention when fitting profiles that are skewed, peaky, or only approximately gaussian. The figure above shows Gauss-Hermite profiles as a function of the skewness coefficient h3 and the kurtosis (peakiness) coefficient h4. See for SUN/258 further details, but note that the default setting in the configuration file is for FIT1D to suppress the negative features in the fitted profiles and to leave only the positive part of Gauss-Hermites.

Because of their ability to fit distorted shapes, Gauss-Hermites are particularly well suited to "capture" the maximum amount of emission from a cube. The fits can be remarkably accurate as is shown in the the figure below showing a 3-component fit (i.e. up to 3 spectral lines) using gausshermite2 functions (i.e. fitting both h3 and h4). Collapsing the resulting cube with fitted profiles can thus result in an accurate and almost noise-free white-light or total-emission map.


Fit1d - Black: original profiles; Red: results of a 3-component Gauss-Hermite2 fit (fitting both h3 and h4)

FIT1D derives its ability to fit a complex line-shape both from the Gauss-Hermite function but also from that it can fit multiple (sub) components to get the best match possible. However, that can make the interpretation of the fits in terms of the physical characteristics and quantities difficult, hence for those you may also want to make a fit of the line-shape using a single standard Gaussian function.

Component Parameter files

Besides a data-cube with the fitted profiles FIT1D also outputs so-called Component parameter files as NDF extensions in the header of the output file. These can also be copied out as independent data-cubes. There is a file for each component (i.e. line) that was fitted along the profile up to the number of components requested by the user. Each plane of a Component parameter file has an image of the value of a fitted parameter across the field-of-view. For instance, the one resulting from a gaussian fit has images respectively showing the fitted Amplitude, Position (velocity), and FWHM as well as a plane with an id-number of the function used.

Much of the (anticipated) use of FIT1D derives from the fact that Component parameter files can be used as input as well: either to provide initial estimates or fixed values to the fitting routine. The difference between values specified in the Component parameter files
and ones declared in a User parameter values file is that the former can vary across the field-of-view whereas the latter will result in the same value being used for all profiles. E.g. for use with spectral-line surveys the User parameter values file can be used to provide initial estimates of the frequencies or velocities at which lines are expected or to fix fits at those frequencies.

By manipulating Component parameter files e.g. resulting from an initial fit, the user can customize or correct subsequent fits. In extrema, a Component parameter file could be made from scratch based on a model and be used to create a spectral-line data-cube with that model (config option: model_only=1) or be used as initial estimates for a fit. Of more practical use, Component parameter files can be used to correct problems associated with a fit since the art of fitting is not in the fitting algorithm, but in providing accurate initial estimates. For instance, the left image below shows a section of an Amplitude plane of a fit where there are problems in a few locations. Setting these location to bad values and using FILLBAD to interpolate over them, the corrected Component parameter file was used as initial estimate for a subsequent fit, resulting in the image on the right

Fit1d - Left: Section of a parameter file showing
      originally fitted amplitudes; Right: Amplitudes after using a
      corrected parameter file from the original fit as initial estimates
      for a subsequent fit.

More creative options are possible: after an initial fit with a gaussian, the function id can be changed to a gausshermite1 in part of the field and the resulting file used as initial estimates for a subsequent fit to account for skewed profiles there. Similarly, the initial guess of the FWHM can be made wide on e.g. nucleus of a galaxy while leaving it more narrow outside. As another example, the fit of multiple components can be limited to only part of the field by setting the parameter file for the second and higher components to bad values outside the relevant region (multiple component parameter files can be used as input: one for each component to be fitted).

In conclusion: please remember that this is a beta-release and that you may run into unanticipated issues. Also chosen limits in the configuration file may need tweaking. If an initial fit looks poor, try adjusting minamp (in units of rms!) or, in particular, minfwhm (in units of pixels!) in the configuration file (see: $SMURF_DIR/smurf_fit1d.def). Also use range to limit the fit to a relevant region of the spectrum.

The released implementation of FIT1D can fit up to 7 components per profile per run, but the output of multiple runs each covering a range in velocities or frequencies can be combined. The fit itself is fully multi-threaded and will be much faster on a modern multi-core computer: a 3-component gausshermite2 fit of 1.1 million spectra (a 2 Gb input file) took 15 minutes on a dual-core, 16 Gb memory machine versus 4 minutes on one with 12 cores and 75 Gb of memory.

Happy fitting!

Remo

2012-08-23

SCUBA-2 reference publications webpage

Please see the following page for appropriate references for the SCUBA2 instrument, calibration and data reduction at:

http://www.jach.hawaii.edu/JCMT/continuum/scuba2/scuba2_references.html

This includes an arxiv link to the Dempsey et al 2012 SPIE paper on SCUBA-2 commissioning which can be used as the reference for SCUBA-2 calibration.
http://arxiv.org/abs/1208.4622

A trilogy of up-to-date SCUBA-2 papers on the instrument, data-reduction and calibration are currently ready for submission. Please check back to the above link periodically. The new links will be added when they are in press.

2012-08-16

Updates to FCFs and extinction correction

Over the past couple months the flux conversion factors and extinction corrections have been updated and we are now very happy with the answers. The main issue from the user perspective is that the FCF you apply critically depends on which version of SMURF was used to generate the map. At the time of writing this data products downloaded from CADC use an older extinction correction than the value you will find in place for the current SMURF. We plan to update CADC processing shortly but reprocessing the historical archive will take some time.

We have set up a web page at JAC listing the parameters that should be used and instructions on how to determine which version of the software was used to generate your map:

http://www.jach.hawaii.edu/JCMT/continuum/scuba2/scuba2_relations.html