Pipelines and Archives: data processing

Showing posts with label data processing. Show all posts

2013-10-08

Holes in heterodyne maps

HARP has only twelve operational receptors since May 20. If one or two also suffer from interference and fail quality assurance, then there can be locations in the reduced spectral cubes where no valid data are present. Such missing spectra (shown in blue) will often be arranged in a grid pattern as in the example below.

This can create issues if you wish to integrate flux or simply improve the cosmetic appearance.

One approach is to smooth the data but this degrades all the spectra. Instead use KAPPA::FILLBAD. This replaces just the bad values with a smooth function derived from neighbouring values, derived by iterating to a solution of Laplace's equation.

Just using FILLBAD's default values will duly fill in the holes. You might prefer not to smooth in the spectral axis by setting Parameter SIZE to [s,s,0] where s is the initial scale length in pixels. The zero means do not smooth along the third axis. The scale length should have a value about half the size of the largest region of bad data to be replaced. Since the largest bad regions apart from the cube peripheries are two pixels across, a command like this

fillbad in=holey out=filled size="[1,1,0]"

is appropriate. The graphic below shows the same region as before, but with the holes filled in.

FILLBAD does what it says on the tin.

There will be some smoothing, but it's not obvious. Below is a KAPPA::CLINPLOT graphic for a part of the filled cube. Each spectrum is a pixel in the image above. Can you identify the three spectra which were previously bad?

2013-07-18

Inclusion of CSO Tau Fits for Determining FCFs

During the first few months of 2013 the WVM at JCMT had several periods of instability where it was unreliable for determining the value of tau. Because of this, members of the science team collaborated to produce a collection of smooth polynomial fits to the tau values from the 225-GHz tau meter at the CSO for the affected nights, which can be used to perform extinction correction in place of the WVM tau values.

In the latest version of starlink (which you can get by rsyncing from /stardev at the JAC) makemap will now first check to see if the date of the observation is one of the (somewhat sporadic) dates when the WVM was unstable. This occurs as long as the ext.tausrc parameter is set to "auto," which it is by default. If the date is one of the affected dates, makemap will look for an available fit to the CSO data in the file specified by the parameter ext.csofit, which is set up by default to refer to the collection of CSO fits produced at the JAC. If makemap cannot find a fit for an observation in the specified path it will print a warning that no fit or WVM data is available and refuse to reduce the observation, though this shouldn't ever happen in normal operation.

If you have observations taken between January 19 and May 14 of this year, using the latest version of starlink rsynced from /stardev at the JAC will help ensure that you get the best extinction data available.

The graph below shows a comparison between the FCFs derived from WVM data and those derived using the new CSO tau fits over the period of January-May 2013. The blue diamonds represent FCFs from the WVM, and the tan circles are FCFs from the CSO tau fits.

2013-02-26

Checking the noise properties of SCUBA-2 science data

Ever wondered if there was an easy way to assess the noise properties of the data going in to your SCUBA-2 maps? Well now there is. A new recipe called ASSESS_DATA_NOISE has been added to the SCUBA-2 pipeline and it performs standard quality-assurance analysis (as done by the quick-look pipeline at the telescope) to give you an idea of the noise properties of the sciences data.

Running ASSESS_DATA_NOISE

To run it, first put all the files you wish to process into a text file (called myfiles.lis in the example below). Create a recipe parameter file if necessary (myparams.ini below). Then initialize the pipeline as usual (not necessary if done previously):

% oracdr_scuba2_XXX -cwd

where XXX is the wavelength of choice. Finally, run the pipeline, adding the name of the recipe at the end of the command line:

% oracdr -loop file -files myfiles.lis -recpars myparams.ini \
  -log sf ASSESS_DATA_NOISE

The recipe displays its results in a KAPPA window, showing the focal-plane layout with the noise for each bolometer in the top left corner, a histogram of the noise in the lower left, and the focal-plane image of the noise equivalent power (NEP) in the lower right. (If you want to turn off this display, add -nodisplay to the ORAC-DR command line above.)

By default the results are calculated for each subscan (i.e., each 30-second file) separately. As described below, options exist to just use the first subscan or to calculate the noise properties for the entire time stream.

The focal-plane mosaics for each subscan are stacked to create a 3-d cube of noise as a function of time (the third axis is MJD). The corresponding NEP images are written into this file under the .more.smurf.nep extension. This file has the suffix _scinoistack.

The results are written out to a log file called log.scinoiseXXX (where XXX is the wavelength as above), which can be loaded into TOPCAT to display the number of bolometers, noise or weighted NEP as a function of time or subscan number for each subarray.

The recipe also writes out a file called log.bolonoise, which contains more results (with the results for each subarray on a separate line), plus index.noise and index.nep. From these files it will be possible to determine which files failed the QA tests, and thus can be excluded from the input list to the map maker.

Of course, the down side to removing data is that it will not be possible to achieve the intended noise level, and may reduce sensitivity to extended structure (as the time series may be broken into smaller chunks). However, it should be easy to determine whether or not using some fraction of the best data gets as close as could be expected.

Customizing

Like most recipes, a number of recipe parameters are available:

NOISE_CONFIG - the config file you wish to use when calculating the noise. It is recommended that you use the same as the one used when making maps. If not given, the standard noise config file is used.
NOISE_CALC - may be "quick", "each" or "full" which will use the first thirty-seconds of data (for a quick check), or for every thirty second data file, or use the entire time stream to calculate the overall noise properties. The default is "each".
FREQRANGE - frequency range over which to calculate the noise. Default is 2-10 Hz.
FREQLO - lower frequency at which to calculate the background. Default is 0.5 Hz.

See the SMURF documentation for the FREQRANGE and FREQLO before adjusting them (SUN/258).

This is just the first release and its usefulness and output data relies on user testing and input. Please try it out and let me know.

2013-02-10

Automated removal of bad-baseline spectra from ACSIS/HARP time series

I presented the following poster at November's ADASS Conference, describing the methods I added to the ORAC-DR REDUCE_SCIENCE_NARROWLINE and REDUCE_SCIENCE_GRADIENT recipes to detect and remove `wobbly' and noisy spectra. It shows some examples of the then identified forms of interference and example results. Since then an additional form---ringing---came to light from looking at more data, and is shown in the latest JCMT Newsletter 34. I'll discuss that in a further blog.

Go here for the full-sized PDF.

2012-09-12

FIT1D: A new SMURF command for ACSIS data

More time ago than I am willing to admit, I started coding a Starlink routine to fit spectral lines in ACSIS cubes. Got a long way until SCUBA-2 commissioning and calibration put a halt to it, but I have finally managed to finish the program, technically a beta-version, as part of the upcoming Kapuahi release of Starlink.

Usage:

% fit1d in out rms [config] [userval] [pardir] [parndf] [parcomp]

What distinguishes FIT1D from other profile fitting routines is that it specifically attempts to deal with two issues: non-gaussian profile shapes and the fact that ACSIS data-cubes have many, potentially very different, profiles to fit. Regarding the latter, there are many fitting routines that produce fitted profiles for data-cubes, but FIT1D also produces cubes with the fitted parameters themselves and can use such files as input to give control over the fit of, in principle, each individual profile in the cube. Thus it is e.g. possible to fit broad lines on the nucleus of a galaxy and narrow lines everywhere else. More about that below.

A section has been added to the SMURF documentation in SUN/258 about FIT1D, which I will try to summarize here. FIT1D is generic in that it can fit profiles along any axis of an up to 7-dim hyper-cube, but will be discussed here in the context of a default RA-Dec-Vel ACSIS cube. Note that the routine assumes that data have been baseline-subtracted, using e.g. MFITTREND, i.e. that the profiles have a zero-level at 0.

Gauss-Hermite shapes as a function of the 3rd-order
      skewness coefficient 'h3' and the 4th-order the kurtosis (peakiness)
      coefficient 'h4'. The red box indicates the limits on acceptable
      values for h3 and h4 as defined in the defaults configuration file. Note
      that the fitted profile by default is restricted to positive values
      and will omit the shown negative features.

Non-gaussian Profiles.

FIT1D essentially re-implements the fitting code for non-gaussian profiles from the GIPSY package (Kapteyn Institute, Groningen, The Netherlands). Function types that can be fitted are Gaussian, Gauss-Hermite, and Voigt profiles. In particular, Gauss-Hermite functions are a powerful extention when fitting profiles that are skewed, peaky, or only approximately gaussian. The figure above shows Gauss-Hermite profiles as a function of the skewness coefficient h3 and the kurtosis (peakiness) coefficient h4. See for SUN/258 further details, but note that the default setting in the configuration file is for FIT1D to suppress the negative features in the fitted profiles and to leave only the positive part of Gauss-Hermites.

Because of their ability to fit distorted shapes, Gauss-Hermites are particularly well suited to "capture" the maximum amount of emission from a cube. The fits can be remarkably accurate as is shown in the the figure below showing a 3-component fit (i.e. up to 3 spectral lines) using gausshermite2 functions (i.e. fitting both h3 and h4). Collapsing the resulting cube with fitted profiles can thus result in an accurate and almost noise-free white-light or total-emission map.


Fit1d - Black: original profiles; Red: results of a 3-component Gauss-Hermite2 fit (fitting both h3 and h4)

FIT1D derives its ability to fit a complex line-shape both from the Gauss-Hermite function but also from that it can fit multiple (sub) components to get the best match possible. However, that can make the interpretation of the fits in terms of the physical characteristics and quantities difficult, hence for those you may also want to make a fit of the line-shape using a single standard Gaussian function.

Component Parameter files

Besides a data-cube with the fitted profiles FIT1D also outputs so-called Component parameter files as NDF extensions in the header of the output file. These can also be copied out as independent data-cubes. There is a file for each component (i.e. line) that was fitted along the profile up to the number of components requested by the user. Each plane of a Component parameter file has an image of the value of a fitted parameter across the field-of-view. For instance, the one resulting from a gaussian fit has images respectively showing the fitted Amplitude, Position (velocity), and FWHM as well as a plane with an id-number of the function used.

Much of the (anticipated) use of FIT1D derives from the fact that Component parameter files can be used as input as well: either to provide initial estimates or fixed values to the fitting routine. The difference between values specified in the Component parameter files
and ones declared in a User parameter values file is that the former can vary across the field-of-view whereas the latter will result in the same value being used for all profiles. E.g. for use with spectral-line surveys the User parameter values file can be used to provide initial estimates of the frequencies or velocities at which lines are expected or to fix fits at those frequencies.

By manipulating Component parameter files e.g. resulting from an initial fit, the user can customize or correct subsequent fits. In extrema, a Component parameter file could be made from scratch based on a model and be used to create a spectral-line data-cube with that model (config option: model_only=1) or be used as initial estimates for a fit. Of more practical use, Component parameter files can be used to correct problems associated with a fit since the art of fitting is not in the fitting algorithm, but in providing accurate initial estimates. For instance, the left image below shows a section of an Amplitude plane of a fit where there are problems in a few locations. Setting these location to bad values and using FILLBAD to interpolate over them, the corrected Component parameter file was used as initial estimate for a subsequent fit, resulting in the image on the right

Fit1d - Left: Section of a parameter file showing
      originally fitted amplitudes; Right: Amplitudes after using a
      corrected parameter file from the original fit as initial estimates
      for a subsequent fit.

More creative options are possible: after an initial fit with a gaussian, the function id can be changed to a gausshermite1 in part of the field and the resulting file used as initial estimates for a subsequent fit to account for skewed profiles there. Similarly, the initial guess of the FWHM can be made wide on e.g. nucleus of a galaxy while leaving it more narrow outside. As another example, the fit of multiple components can be limited to only part of the field by setting the parameter file for the second and higher components to bad values outside the relevant region (multiple component parameter files can be used as input: one for each component to be fitted).

In conclusion: please remember that this is a beta-release and that you may run into unanticipated issues. Also chosen limits in the configuration file may need tweaking. If an initial fit looks poor, try adjusting minamp (in units of rms!) or, in particular, minfwhm (in units of pixels!) in the configuration file (see: $SMURF_DIR/smurf_fit1d.def). Also use range to limit the fit to a relevant region of the spectrum.

The released implementation of FIT1D can fit up to 7 components per profile per run, but the output of multiple runs each covering a range in velocities or frequencies can be combined. The fit itself is fully multi-threaded and will be much faster on a modern multi-core computer: a 3-component gausshermite2 fit of 1.1 million spectra (a 2 Gb input file) took 15 minutes on a dual-core, 16 Gb memory machine versus 4 minutes on one with 12 cores and 75 Gb of memory.

Happy fitting!

Remo

2011-03-31

Automatically adding fake sources to real data, part 2

In a previous entry, I described how to go use the new ORAC-DR recipe REDUCE_SCAN_FAKEMAP to automatically add fake sources to the raw time series in order to get a handle on the transfer function of the map-maker. That's all very well, but how about starting with something simpler, such as a Gaussian? Now you can add a Gaussian of given amplitude and FWHM without first creating an image to feed in to the pipeline. As before, the Gaussian properties are defined using recipe parameters:


[REDUCE_SCAN_FAKEMAP]
FAKEMAP_FWHM = 15
FAKEMAP_SCALE = 0.5
FAKEMAP_OFFSET = dx,dy

The FWHM is in arcsec and is the only mandatory parameter (the SCALE and OFFSET are optional). In this case, the FAKEMAP_SCALE parameter is the amplitude of the Gaussian in Jy/beam. The Gaussian can be positioned anywhere in the map by giving an offset (RA, Dec in arcsec).

The pipeline can help you out further by using a couple of useful defaults. If the world "beam" is given as the FWHM, then the pipeline will use the default beam size appropriate for the current wavelength (14" at 850 um, 8" at 450 um). If the amplitude is not specified then (somewhat arbitrary) defaults of 1 and 4 Jy/beam are assumed (at 850 and 450 um respectively).

Note that the FWHM must be greater than half the pixel scale (the pipeline will set it to that value if the given value is less), but in principle there is no upper limit. In practice the upper limit is defined by the array footprint (about 2.5 arcmin for S2SRO data): anything much larger than that will be removed by makemap as part of fitting the common mode.

As usual, update your Starlink installation (with rsync) to try it out.

2011-03-18

Recipe Parameters and the Science Archive

When processing jobs are submitted to the science archive ORAC-DR is configured to load a specialist recipe parameter file associated with the project. This can be used by PIs or surveys to tune processing to their liking. The idea is that PIs or survey teams send us their recipe parameter files and we then make sure that the processing jobs use them.

This week I've updated the recipe parameter system to allow tuning based on the object name as well as the recipe name. In some projects the objects are completely different so a single parameter file was not useful. Hopefully this change will encourage more people to send us parameter files.

To use the object based scheme simply append the object name to the recipe name in the file along with a colon. No spaces in the object name.

  [REDUCE_SCAN:M82]
  PAR1 = A
  PAR2 = B

If there is also a recipe-based entry those parameters will be merged in

  [REDUCE_SCAN]
  PAR1 = Aprime
  PAR3 = C

So in this example if the object name is "M82" PAR1, PAR2 and PAR3 will be set and PAR1 will have a value "A". If the object is no "M82" only PAR1 and PAR3 will be set and PAR1 will have value "Aprime".

This change is currently on the stardev rsync server, or if you have git installed you can update your own ORAC-DR distribution.

2011-01-18

Automatically adding fake sources to real data

Further to Ed's post on adding fake sources to SCUBA-2 maps, I've added some new functionality to the pipeline which allows this process to be automated.

There is a new recipe called REDUCE_SCAN_FAKEMAP which should be specified on the command-line when running the pipeline. (Normally the recipe is chosen automatically.) This recipe behaves in exactly the same way as the standard recipe, and just passes the given `fake' map (adjusted and scaled as necessary) to the map-maker.

The recipe has a small number of parameters which can be given in addition to those already accepted by the pipeline (e.g. an alternative makemap configuration file). These parameters are all prefixed with "FAKEMAP_":


[REDUCE_SCAN_FAKEMAP]
FAKEMAP_MAP = mymap.sdf
FAKEMAP_SCALE = 0.5
FAKEMAP_REGRID = 1/0
FAKEMAP_OFFSET = dx,dy

The name of the base input map is mandatory (otherwise the pipeline exits immediately). The remaining parameters are optional. The SCALE parameter is the same as the "fakescale" parameter used by the map-maker and defaults to 1 if not given. Two further parameters control how the input map is adjusted before adding to the raw timeseries data.

The OFFSET parameter is a pair of RA,Dec offsets in arcsec which allow the input map to be shifted so that it doesn't coincide with a source in the SCUBA-2 data. The default is not to shift the input map. The offsets are applied to the source: i.e. a shift of 60,60 puts a source originally at (02:30:00.0, 00:00:00.0) at (02:30:04.0, +00:01:00.0) in the output map.

The REGRID parameter is a flag which, if true, regrids the input map to match the pixel coordinates of the output map (i.e. same pixel bounds and pixel size). The default is false, unless an OFFSET is given: in this case the input map must be aligned (regridded) to match the output map so that the pixel bounds are correct. Note that the original input file is not modified: the pipeline makes a copy and operates on that.

Here's an example of calling this recipe on the command line (processing a list of files contained in mydata.lis):


% oracdr -loop file -files mydata.lis -recpars myrecpars.ini REDUCE_SCAN_FAKEMAP

This new capability is available now with an update of your Starlink installation. Please try it out and let me know how well it worked (or didn't!), or send a message to the scuba2dr mailing list.

2011-01-11

Correcting for pointing drifts with PICARD

If you're lucky enough to have a bright source in your map, you may be able to improve the image fidelity by correcting for pointing offsets between observations. The pipeline now has this capability and it is applied automatically for calibrators, but not for most science targets. I've added a new PICARD recipe, called SCUBA2_REGISTER_IMAGES, which allows you to shift your images to line up at a common reference position before mosaicking. Here's how to do it.

First step, upgrade to the latest Starlink version via rsync.

Step two, create a recipe parameter file (say, params.ini) and add the following lines to it:


[SCUBA2_REGISTER_IMAGES]
REGISTER_IMAGES = 1
REGISTER_RA = HH:MM:SS.S
REGISTER_DEC = +DD:MM:SS.S

where "HH:MM:SS.S" and "+DD:MM:SS.S" are the RA and Dec of the source you wish to use.

Then run


% picard -log sf -recpars params.ini SCUBA2_REGISTER_IMAGES myfiles*.sdf

and if all goes well, you will end up with a series of output files, one for every input file, which end in "_reg.sdf". These files can then be coadded using MOSAIC_JCMT_IMAGES.

The recipe works by fitting a 2-d profile to the brightest source within a 2x2 arcmin² box centred on the given reference position. The WCS is adjusted to place the reference position at the position reported by the fit.

However, note that this method will only work if the source you're using is present in every image, is compact and clearly resolved from other nearby sources and is the brightest source within the 2x2 arcmin² search box in every input image. It won't work for deep extragalactic maps, nor does it correct for any pointing drifts that occur within an observation. Correcting for those drifts is much more involved, but it is something we'll investigate over the coming months. It's been tested on HL Tau, OMC-1 and AFGL4029 in W5-East.

It's also possible to do the registering and coadding with a single call to MOSAIC_JCMT_IMAGES. In this case you'd want to add the following to a recipe parameter file (in addition to any other coadding parameters):


[MOSAIC_JCMT_IMAGES]
REGISTER_IMAGES = 1
REGISTER_RA = HH:MM:SS.S
REGISTER_DEC = +DD:MM:SS.S

Here's an example using some observations of HL Tau data taken in December 2009. The corrections were as much as 8.5 arcsec in RA (the Dec corrections were less, up to 3 arcsec). The improvements are modest at 850 um (of order 10% increase in peak flux and a few per cent reduction in beam size) but much more significant at 450 um. The peak flux after correcting for pointing drifts is more than 20% higher than before, and the FWHM has decreased by 15% (9.2" to 7.9"). Furthermore, the source elongation (ratio of major and minor axes) is reduced from 16% to less than 3%. Before and after images are shown below.

Left: HL Tau at 450 um, coadd of 9 separate observations from 20091213 and 20091214. Right: same data after registering each image before coadding. Both images are scaled between the same limits.

2010-11-29

SCUBA-2 at the ADASS conference

At the start of November I attended the ADASS conference in Boston and presented a poster paper on SCUBA-2 data processing. Readers of this blog probably won't find anything new in here but just in case, the conference paper itself is now on astro/ph at http://arxiv.org/abs/1011.5876. Thanks to Dave Nutter for supplying the Orion B image.

2010-09-21

SCUBA-2 Pointings in the science archive

This is just a quick note to explain what has been happening to pointing observations in the science archive. Pointings are public data so the reduced images of pointing sources account for a significant fraction of publicly available SCUBA-2 reduced data. We have realised that these data are problematic because we were running the standard pointing recipe at CADC which filters out large scale structure and clips the image in order to make it easier to detect the pointing source. This is not obvious to the casual user of the archive and has recently caused some confusion.

The image above (a 20 second 450 micron observation from 2010-02-19 #39) is the standard product for a pointing of OMC-1. As is obvious, a lot of extended emission has disappeared. This week we are going to start reprocessing pointing observations using the standard recipes so that the archive contains science products rather than products that aid telescope observing. As an example the image below is the same observation using the standard recipe and it is much more representative of what SCUBA-2 is capable of:

2009-10-02

Automated advanced processing at CADC

As previously mentioned, the ORAC-DR data reduction pipeline could be run at CADC to generate basic nightly products. This processing used to be run at JCMT on a nightly basis, with the products transferred to CADC.

This processing is now being run at CADC on their processing system. Processing requests are made at 0900 HST every day, and nightly products will be available sometime after that (depending on the amount of processing needed -- scans take longer to reduce than jiggles).

In the near future, effort will be undertaken to process the backlog of ACSIS data.

2008-01-28

scubadev

Scubadev's mysterious lack of speed continues to be a concern, especially since it is a baseline spec for the summit DR computers. Had a chat with Tim where we concluded that after the Starlink release is out of the way (1-2 weeks) we will take it down and turn it into a standalone gentoo box, then a standalone CentOS box. This should allow us to narrow down whether the problem is related to hardware, OS, or the integration in to the network.

Pipelines and Archives