Pipelines and Archives

2010-04-06

Detection and Correction of DC steps

Sudden changes in the DC level are seen to occur within bolometer time streams. The following algorithm attempts to identify and correct these steps. Each bolometer time stream is processed independently using the following two part process

Part 1 - Detecting Candidate Steps:

In this part we search the bolometer time stream for contiguous blocks of samples that have unusually large gradients compared to the neighbouring samples. Each such block of samples is considered to be a candidate DC jump. The starting and ending time of each such block is noted. Once all candidate jumps have been found for a bolometer, the algorithm proceeds to part 2. Candidate jumps are found as follows:

Smooth the bolometer time stream using a median filter of width DCSMOOTH (defaults to 50) samples. This reduces the effects of the noise in the data. Using a median filter rather than a mean filter helps to preserve the sharp edges at each DC jump, and is robust against spikes, etc. The following figure shows a typical step, with the original bolometer data values in red and the median-smoothed data values in blue.
At each sample, find the difference between the two adjacent values in the median filtered bolometer data. This is proportional to the gradient of the median filtered bolometer data. These gradients include both the gradient of the underlying noise free median-filtered data stream ,and the extra gradient caused by noise, spikes, DC jumps, etc. The following figure shows these differences for the above step. The high gradients at the step are clearly visible.
We now need to split these differences up into differences caused by variation of the underlying noise-free data, and differences caused by noise, steps, spikes, etc.To do this we smooth the above differences, and use the smoothed differences as the gradient of the underlying noise free data. In order to get an accurate result, we need to do this smoothing with a mean filter rather than a median filter. But a mean filter is badly affected by aberrant values, so we first take a copy of the differences found above, and remove extremely large differences by flagging them as bad. Specifically, differences more than three times the RMS difference (taken over the whole time stream) are set bad. The RMS is then re-calculated and a further two such rejection iterations are performed. This gets rid of high gradient values caused by spikes, jumps, point sources, etc, leaving the following:
The differences close to a step are often not typical of those in the wider neighbourhood, so each block of contiguous values removed above is doubled in width, leaving the following:
Smooth the remaining differences using a mean filter of width DCSMOOTH. The smoothing process fills in small holes in the array (up to 0.8*DCSMOOTH), but will not fill in larger holes. Such larger holes are filled in by linear interpolation. The resulting smoothed differences - shown in blue below overlayed on the original differences in red - estimate the local gradient of the underlying noise-free median-filtered data stream.
Subtract the smoothed differences from the total differences found in step 2 above. This gives the residual differences caused by noise, spikes, DC jumps, etc., shown below:
To decide if a residual difference value is "unusually high" we need an estimate of the local RMS of these residual differences at each point in the data stream. To get this we smooth the squared residual gradients using a median filter of width DCSMOOTH2 (fixed at 200), and then take the square root of the resulting values.
Divide the residual differences by the local RMS of the residual differences to get a "signal-to-noise" ratio for each residual difference value:
Find all blocks of contiguous absolute SNR values greater than DCTHRESH (defaults to 25). Such blocks are allowed to contain small numbers of values that do not meet this criterion. The maximum length for such a section of low SNR is DCFILL (fixed at 40) samples. For the example step shown above, the block starts at 13696 and ends at 13703. Blocks are ignored if they fail any of the following tests:

The width of the block must be less than DCMAXWIDTH (fixed at 60)
The total of all the SNR values in the block must be at least DCTHRESH. This test avoids spikes being treated as two very close steps, since such steps will have SNR values of opposite signs and will thus cancel out.
The total of all SNR values in the block must exceed half the value of the largest single SNR value in the block.
(this tends to rejects spikes since they will have large positive and negative SNR values within the block, which will cancel out).

Part 2 - Measuring and Correcting Steps

In this part, each of the candidate steps found above is processed to determine the height of the step. This is done by doing a least squares linear fit to the median-smoothed bolometer data just before the step, and another fit to the data just after the step. These two fits are used to predict the value at the centre of the step, and the difference between these two predicted values is used as the step height. Various other tests are performed to identify and reject spurious step detections, and finally the original bolometer data stream is corrected to take out the step. For a given bolometer, each candidate step is processed as follows:

Correct the step start and end to include adjacent SNR values that are below DCTHRESH but are still significantly high. Specifically, the step start is moved to the end of the last block of DCNLOW (fixed at 5) SNR values that are all less than DCSIGLOW (fixed at 8.0), and which occurs before the original step start. Likewise, the step end is moved to the start of the earliest block of DCNLOW SNR values that are all less than DCSIGLOW, and which occurs after the original step end.For the example step shown above, the start moves from 13696 to 13693, and the end moves from 13703 to 13711.
Perform a linear fit to the median-smoothed bolometer data before the step. The fit is over a box containing DCFITBOX (defaults to30) samples. A gap of DCFITBOX samples is left between the end of the box and the start of the step.
Perform a linear fit to the median-smoothed bolometer data after the step. The fit is over a box containing DCFITBOX samples, and a gap of DCFITBOX samples is left between the end of the step and the start of the box.
These two fits are evaluated at the centre of the step. The difference between these values gives an estimate of the step height. However, moving the fitting box slightly can often result in big changes in the line gradient, and thus big changes to the estimate of the jump height. Steps where this is the case are ignored (i.e. left unfixed). To check this, more linear fits are performed at increasing distances from the step, and the consistency of the estimated jump heights from these extra fits is checked. Specifically, the fitting box used in step 2) above is moved one sample earlier, a new fit is performed, and the new fit is used to estimate the value at the centre of the step. The fitting box is then moved again by one sample and a third fit performed giving a third estimate of the central data value. This process is repeated until we have performed 2*DCFITBOX fits (an efficient algorithm is used to perform these multiple fits). The variance of all these estimates of the central value is found. The same thing is then done to the fit performed in step 3) above, to give us the variance of the central data value determined from the data following the step. The uncertainty in the jump height is then taken to be the square root of the mean of the two variances values, and the jump is ignored if the jump height - determined from steps 2) and 3) - is less than DCTHRESH2 (fixed at 1.5) times the uncertainty in the jump height.
The jump is also ignored if it is less than DCTHRESH3 (fixed at 4.0) times the noise in the original bolometer data. The bolometer noise value used is the RMS of the differences between adjacent bolometer values, excluding aberrant points, and corrected by a factor of 1/sqrt(2).
If the step is not rejected, all the original bolometer values following the centre of the step are adjusted upwards or downwards by the step height.

Once all steps have been corrected, a constant value is added back on to all bolometer samples so that the original mean value in the bolometer is retained.

2010-03-25

SMURF and quality flagging

As promised in one of the recent operations telecons, a blog post about quality flagging, and how all the new quality flagging statistics can help you understand what's going on.

First some background. The raw data is a cube, containing 2 spatial dimensions, and 1 time dimension. There is a single quality array, with these same dimensions, that is used by the map-maker to keep track of various problem areas that are identified both before and during the iterations. Each element of this array is an unsigned byte, and each bit is a flag. You can list the meaning of each bit using Kappa showqual on an exported "_res" model component (with "qua" also specified in exportndf in the dimmconfig file):


BADDA           (bit 1) - "Set iff a sample is flagged by the DA"
BADBOL          (bit 2) - "Set iff all data from bolo to be ignored"
SPIKE           (bit 3) - "Set iff a spike is detected"
DCJUMP          (bit 4) - "Set iff a DC jump is present"
PAD             (bit 5) - "Set iff data are padding"
APOD            (bit 6) - "Set iff data are apodized/boundary"
STAT            (bit 7) - "Set iff telescope was stationary"
COM             (bit 8) - "Set iff data common-mode rejected"

Note that "BADDA" used to be called "BADSAM", and "COM" is also relatively new (common-mode rejection was using BADDA previously, which got confusing).

When you run the iterative map-maker you will now get some information about the quality. When the first iteration starts you will see something like this:


--- Size of the entire data array ------------------------------------------
bolos  : 1280
tslices: bnd:2000(0.2 min), map:150000(12.5 min), tot:152000(12.7 min)
Total samples: 194560000
--- Quality flagging statistics --------------------------------------------
 BADDA:   68400000 (35.16%),         450 bolos
BADBOL:   69312000 (35.62%),         456 bolos
 SPIKE:          0 ( 0.00%),
DCJUMP:          0 ( 0.00%),
   PAD:    2560000 ( 1.32%),        2000 tslices
  APOD:          0 ( 0.00%),           0 tslices
  STAT:    2560000 ( 1.32%),        2000 tslices
   COM:          0 ( 0.00%),
Total samples available for map:  123600000, 64.38% of max

The first part of this output is just telling you about the dimensions of the entire chunk that it is currently processing. With a single sub-array, the number of bolos is 1280 (ignoring for a moment how many are actually working), and "tslices" is the number of time slices (200 Hz samples). This second number is broken down: "bnd" is the number of samples in the boundary region -- these are the parts of the data that are either flagged as padding or apodization. "map" is all of the other samples -- this is the theoretical maximum amount of data that could go into the final map if all the bolometers were working, and there were no spikes, steps or other glitches. "tot" is simply the sum of the two. The "Total samples" line is then product of the number of bolometers with the total number of time slices in the array.

The second part of the report gives a breakdown of the quality flagging bits in the data array, after loading in the data and applying the flatfield, but before any of the pre-conditioning steps, or the first iteration of the model components. In the quality report the first column of numbers gives the total number of samples with those quality flags, followed by percentages of the total data array size in brackets. Since both BADDA and BADBOL apply to entire detectors, the third columns gives the total number of samples that have been flagged, divided by the number of time slices -- giving the numbers of bolometers. "PAD" and "STAT" flags are already set since padding and apodization of the data happens as they are loaded. Since all bolometers are flagged at a given time slice, the third column of numbers gives the number of samples divided by the number of bolometers, giving a number of time slices.

After each iteration, a similar report will be provided:


--- Quality flagging statistics --------------------------------------------
 BADDA:   68400000 (35.16%),         450 bolos  ,change          0 (+0.00%)
BADBOL:   80104000 (41.17%),         527 bolos  ,change   10792000 (+15.57%)
 SPIKE:          0 ( 0.00%),                    ,change          0 (+0.00%)
DCJUMP:    2113837 ( 1.09%),                    ,change    2113837 (+inf%)
   PAD:    2560000 ( 1.32%),        2000 tslices,change          0 (+0.00%)
  APOD:    5120000 ( 2.63%),        4000 tslices,change    5120000 (+inf%)
  STAT:    2560000 ( 1.32%),        2000 tslices,change          0 (+0.00%)
   COM:   10640000 ( 5.47%),                    ,change   10640000 (+inf%)
Total samples available for map:  107990669, 57.79% of max
     Change from last iteration:  -15609331, -12.63% of previous

At the end of the first iteration, this will report quality flags that were set both during pre-processing, and the first calculation of the model components. While some of the quality bits are static throughout the iterations (BADDA,PAD,APOD), the others can change each time. The fourth columns of numbers give the changes both as an absolute number of samples since the last iteration, and as a percentage of the number from the previous iteration (+infinity if the previous iteration didn't have any flagged). Here we see that the number of bad bolometers (BADBOL) has increased from 456 to 527 (15%) from the initial list of good bolometers. These extra detectors are almost exclusively rejected during calculation of the common-mode signal (they have a shape that is a significant outlier from all the other detectors). For this reason the increase in COM (in samples) is nearly identical to BADBOL. No spikes were flagged during this iteration (SPIKE), but about 1% of the data was flagged as having DC steps (in addition to the steps themselves, a small region around each step is also flagged). Finally, "STAT" identifies parts of the data where the telescope was not moving (threshold speed given by "flagstat" in the dimmconfig file). For these data it only flags the padding at the start and end of the data! (this is clearly not useful behaviour, so I will fix it to ignore these parts of the data in the future).

At the end of the individual quality stats is a summary for the entire array. "Total samples available for map" is the number of samples with no quality bits set, and it is also expressed as a percentage of the theoretical maximum discussed above for the "Size of the entire data array" at the start. The final line shows how this number has changed since the previous iteration.

In summary: watch the number of samples (and percentage) of the data going into the final map. If we typically have about 50% working bolometers, we should expect a number about that large going into the map at the end! If that number falls precipitously (or goes to 0), this report will help you to identify the problem. The usual culprits are DC step flagging, and common-mode rejection.

2010-03-18

SMURF cookbook

For the benefit of those not at the telecon this morning (or those that zoned out...)

The question was "is there a DR cookbook". The answer is yes, but it is about two months out of date. It should serve as an introduction, with the material in this blog providing an update. Of course, we will bring the cookbook up to date as things stabilise.

I have added a link to the SMURF cookbook in the "Useful links" sidebar section to the right. If you have a current Starlink installation you can also type "showme sc19" for the cookbook and "showme sun258" for the SMURF manual.

2010-03-17

Flatfielding updates

Summary: Flatfield ramps work really well and SMURF can now automatically handle them in the map-maker.

SCUBA-2 bolometers need to be calibrated to understand how they respond to varying signal coming from the sky and astronomical object. The original plan was to calibrate in the dark (shutter closed). The sequence goes something like:

Select a reference heater value, take a dark frame
Choose a new heater setting, take a dark frame
Take a dark frame at the reference heater value
Choose a different heater setting, take a dark frame
Take a dark frame at the reference heater value

and continue until you have covered a reasonable range of heater settings. As the heater is changed the bolometers read out a different current. Any drifts in the instrument are compensated by averaging the surrounding reference frames and subtracting. This means that you end up with a curve that goes through zero power at the reference heater value. In order to convert this to a flatfield you either fit a polynomial as a function of measured current (so that you can look up the power) or else use "TABLE" mode and do a linear interpolation between measurements either side of the measured current. The gradient of the curve (how the bolometer responds to changes in power) is the "responsivity" and is measured in amps per watt. The responsivity image can be calculated using the SMURF calcflat command.

When you open the shutter the idea is that you "heater track" to the sky. This involves you adjusting the power to the heater such that the sky power detected by the bolometer results in the same current being measured by the bolometer as it measured in the dark. We do this by looking at the signal from a set of tracking bolometers and assume that those bolometers are representative of the others on the array. In reality what happens is that about 80% of the bolometers do more or less read the same signal before and after opening the shutter but the other 20% are in a completely different place. This would not be a problem if the responsivity didn't change for those 20% but unfortunately it does. We have verified this by doing finely spaced pong maps on Mars covering a 6x6 arcmin area. This takes about 15 minutes but gives us a beam map of every single bolometer. Analysing the Mars images showed that the bolometers with the lowest responsivity also measure a very low integrated flux for Mars and so the calibration does change when the shutter is opened.

The solution for this was to change flatfielding to work on the sky rather than in the dark. This works in just the same way as previously, using reference sky measurements to compensate for drift, and the top plot in the figure shows a sky flatfield that is working pretty much perfectly. Finely-spaced maps of Mars confirm that all the bolometers are calibrated to within 10% with no drop off for the low responsivity bolometers.

At this point things were looking good but we still had the issue that the sky flat takes a few minutes and really has to be done every time you do a new setup and probably at least once an hour. They also are very dependent on observing conditions as could be seen on 20100310 and a few days before hand where the sky was terribly unstable despite brilliantly low opacity (0.03 CSO tau). The middle plot below shows a sky flat on 20100310 and it is immediately obvious that the sky is varying very fast and varying the power over a much larger range than the heater is adjusting for. This flatfield failed to calibrate any bolometers at all and we had to resort to dark flatfields to get a baseline calibration (with the associated worries described above).

We had known this was going to be an issue so in the early part of the year we had been modifying the acquisition system to do fast flatfield ramps. Rather than setting the heater, doing an observation, changing the heater, doing an observation we can now change the heater value at 200 Hz (currently we take 3 measurements at each setting though). On 20100223 we enabled sky flat field ramps at the start and end of every single mapping observation and a few days later we added it to focus, pointings and sky noise observations. The bottom plot shows the flatfield ramp for the observation that immediately followed the discrete sky flatfield shown in the middle plot. There is an issue with the very last ramp but the flatfielding software in SMURF had no problem calculating a flatfield for 850 bolometers (SMURF does compensate for drift in the reference heater values). The flatfield ramps are going to help enormously with calibration.

Actually using these flatfields in the map-maker took some work but yesterday I committed changes to SMURF so that flatfield ramps will be calculated and used when flatfielding data in the map-maker (and other SMURF commands). All you need to do is give all the files from an observation to SMURF and it will sort everything out.

I have updated the /stardev and /star rsync server in Hilo (64-bit and 32-bit). There is also a new nightly build available for OSX Snow Leopard 64bit in the usual place.

One final caveat, we have not yet calibrated the resistance of each bolometer relative to the nominal 2 ohms. We have taken data by looking at a blackbody source which should give us a way of tweaking the resistances. When this happens the flatfielding will change slightly and maps will need to be remade (although how critical that is will depend on how much we tweak the bolometers).

2010-03-09

JLS/ACSIS DR telecon – 20100304

Attendance: GAF, JH, CW, PR, JDF, TJ, FE, HST, ACC

Actions from last meeting:

JAC to make a more compact and readable QA report format and make this log available to observers/co-Is following nightly reduction via the OMP (as a downloadable file).
- update: there had been correspondence between JennyH and BradC and although there had been some work done, this item had not been completed before BradC left – ONGOING
JLS teams to provide feedback on what should go in parameter file (once JAC have made initial list). – ONGOING
BradC to work through SLS pipeline and QA requirements as provided to him (and available on SLS wiki)
- update: following BradC's departure, this will have to be picked up by his replacement (hopefully by next month) – ONGOING
For JLS teams to provide JAC with images/data/log of spikes when they come across them in their data.
- update: JAC had received some spike examples from SLS. These have been forwarded to DaveB but given his current priorities we may not see any movement on this for a few months yet (summer maybe?) – ONGOING
ChristineW to pass on NGS QA and moment map making scripts to JAC. – DONE
- ACTION: TimJ to look into whether BradC did code this into the pipeline?
AntonioC to organise the production of pipeline documentation
- update: some minor organisation of documentation but this has fallen down the list of priorities since the start of SCUBA-2 commissioning – ONGOING/STALLED
AntonioC to email coords asking them to provide email contacts of people who are reducing ACSIS data and are willing to act as ACSIS DR contacts. – DONE
JLS coords to provide email contacts of people who are reducing ACSIS data and are willing to act as survey DR contacts (send to FrossieE). – DONE

News from JAC

The software group took a big hit to its available effort when BradC left for new employment in February 2010. We are working on buying in some effort from an experienced ex-Starlink programmer to pick up on many of the tasks that Brad left. This person's effort should start becoming available to us from April 2010.
since the commissioning of SCUBA-2 started and the subsequent onset of SCUBA-2 Shared Risks Observing, JAC has been inundated in SCUBA-2 data
we can now re-reduce data at the JSA across different UT nights to create so-called "project products". This is a very important step and these products are essentially the Advanced Data Products (ADPs) that the JSA will serve.
FrossieE continues to submit data reduction jobs to the JSA but is experiencing problems as makecube has been timing out (in project mode) when cubes are very, very large

Report from GBS

a meeting of the GBS was help in Leiden in February 2010; discussions were held on the QA at that meeting
there are still missing QA steps in the pipeline reduction. For instance, bad scans are propagating through to final product, (bad scans = one or combination of high rms, many bad pixels, high variation across spectra)
- ACTION JAC: this is a high priority for the ACSIS pipeline at the JSA and JAC will look into this when effort materialises

Report from NGS

no issues reported
observing on the ACSIS portion of the survey has completed.
the team is reducing the data themselves via student effort; they will then want to upload the final products to the archive

Report from SLS

the team is focussing on getting all data reduced to then attack QA issue consistently
- ACTION: GaryF to direct FrossieE to SLS wiki where QA requirements are published
PaulR and HollyT have enough data now to compare pipeline reduced data with hand reduced data
- ACTION : PaulR to forward DR documentation to JAC – DONE

Next meeting May 6th.

2010-03-01

Data processing at CADC

Right now we are having trouble reducing SCUBA-2 data for the JCMT Science Archive (JSA).Briefly the problems seem to be:

The JSA wrapper is reporting success even in some cases of failure
Some maps take so long to make that MAKEMAP is timing out during long maps (this will also affect processing on private computers)
There is a bizarre NFS problem on the processing nodes that causes required perl modules to "disappear" when the pipeline is looking for them.

We're working through fixing these, so until then availability for products is a bit patchy. The good news is that processing throughput is great, so catching up with the backlog does not seem to be a problem.

2010-02-26

Monitoring changes to the DR software

Yesterday I gave instructions on how to retrieve the newest version of the Starlink software but it may not always be clear to people what changes are being made to the software to decide whether you want to get an update. The source code repositories for the Starlink software and ORAC-DR have RSS feeds that you can monitor in your standard news feed reader (e.g. Google Reader). Click on the RSS icon in the URL bar for the Starlink repository and the ORAC-DR repository.