2010-03-25

SMURF and quality flagging

As promised in one of the recent operations telecons, a blog post about quality flagging, and how all the new quality flagging statistics can help you understand what's going on.

First some background. The raw data is a cube, containing 2 spatial dimensions, and 1 time dimension. There is a single quality array, with these same dimensions, that is used by the map-maker to keep track of various problem areas that are identified both before and during the iterations. Each element of this array is an unsigned byte, and each bit is a flag. You can list the meaning of each bit using Kappa showqual on an exported "_res" model component (with "qua" also specified in exportndf in the dimmconfig file):

BADDA (bit 1) - "Set iff a sample is flagged by the DA"
BADBOL (bit 2) - "Set iff all data from bolo to be ignored"
SPIKE (bit 3) - "Set iff a spike is detected"
DCJUMP (bit 4) - "Set iff a DC jump is present"
PAD (bit 5) - "Set iff data are padding"
APOD (bit 6) - "Set iff data are apodized/boundary"
STAT (bit 7) - "Set iff telescope was stationary"
COM (bit 8) - "Set iff data common-mode rejected"

Note that "BADDA" used to be called "BADSAM", and "COM" is also relatively new (common-mode rejection was using BADDA previously, which got confusing).

When you run the iterative map-maker you will now get some information about the quality. When the first iteration starts you will see something like this:

--- Size of the entire data array ------------------------------------------
bolos : 1280
tslices: bnd:2000(0.2 min), map:150000(12.5 min), tot:152000(12.7 min)
Total samples: 194560000
--- Quality flagging statistics --------------------------------------------
BADDA: 68400000 (35.16%), 450 bolos
BADBOL: 69312000 (35.62%), 456 bolos
SPIKE: 0 ( 0.00%),
DCJUMP: 0 ( 0.00%),
PAD: 2560000 ( 1.32%), 2000 tslices
APOD: 0 ( 0.00%), 0 tslices
STAT: 2560000 ( 1.32%), 2000 tslices
COM: 0 ( 0.00%),
Total samples available for map: 123600000, 64.38% of max


The first part of this output is just telling you about the dimensions of the entire chunk that it is currently processing. With a single sub-array, the number of bolos is 1280 (ignoring for a moment how many are actually working), and "tslices" is the number of time slices (200 Hz samples). This second number is broken down: "bnd" is the number of samples in the boundary region -- these are the parts of the data that are either flagged as padding or apodization. "map" is all of the other samples -- this is the theoretical maximum amount of data that could go into the final map if all the bolometers were working, and there were no spikes, steps or other glitches. "tot" is simply the sum of the two. The "Total samples" line is then product of the number of bolometers with the total number of time slices in the array.

The second part of the report gives a breakdown of the quality flagging bits in the data array, after loading in the data and applying the flatfield, but before any of the pre-conditioning steps, or the first iteration of the model components. In the quality report the first column of numbers gives the total number of samples with those quality flags, followed by percentages of the total data array size in brackets. Since both BADDA and BADBOL apply to entire detectors, the third columns gives the total number of samples that have been flagged, divided by the number of time slices -- giving the numbers of bolometers. "PAD" and "STAT" flags are already set since padding and apodization of the data happens as they are loaded. Since all bolometers are flagged at a given time slice, the third column of numbers gives the number of samples divided by the number of bolometers, giving a number of time slices.

After each iteration, a similar report will be provided:


--- Quality flagging statistics --------------------------------------------
BADDA: 68400000 (35.16%), 450 bolos ,change 0 (+0.00%)
BADBOL: 80104000 (41.17%), 527 bolos ,change 10792000 (+15.57%)
SPIKE: 0 ( 0.00%), ,change 0 (+0.00%)
DCJUMP: 2113837 ( 1.09%), ,change 2113837 (+inf%)
PAD: 2560000 ( 1.32%), 2000 tslices,change 0 (+0.00%)
APOD: 5120000 ( 2.63%), 4000 tslices,change 5120000 (+inf%)
STAT: 2560000 ( 1.32%), 2000 tslices,change 0 (+0.00%)
COM: 10640000 ( 5.47%), ,change 10640000 (+inf%)
Total samples available for map: 107990669, 57.79% of max
Change from last iteration: -15609331, -12.63% of previous


At the end of the first iteration, this will report quality flags that were set both during pre-processing, and the first calculation of the model components. While some of the quality bits are static throughout the iterations (BADDA,PAD,APOD), the others can change each time. The fourth columns of numbers give the changes both as an absolute number of samples since the last iteration, and as a percentage of the number from the previous iteration (+infinity if the previous iteration didn't have any flagged). Here we see that the number of bad bolometers (BADBOL) has increased from 456 to 527 (15%) from the initial list of good bolometers. These extra detectors are almost exclusively rejected during calculation of the common-mode signal (they have a shape that is a significant outlier from all the other detectors). For this reason the increase in COM (in samples) is nearly identical to BADBOL. No spikes were flagged during this iteration (SPIKE), but about 1% of the data was flagged as having DC steps (in addition to the steps themselves, a small region around each step is also flagged). Finally, "STAT" identifies parts of the data where the telescope was not moving (threshold speed given by "flagstat" in the dimmconfig file). For these data it only flags the padding at the start and end of the data! (this is clearly not useful behaviour, so I will fix it to ignore these parts of the data in the future).

At the end of the individual quality stats is a summary for the entire array. "Total samples available for map" is the number of samples with no quality bits set, and it is also expressed as a percentage of the theoretical maximum discussed above for the "Size of the entire data array" at the start. The final line shows how this number has changed since the previous iteration.

In summary: watch the number of samples (and percentage) of the data going into the final map. If we typically have about 50% working bolometers, we should expect a number about that large going into the map at the end! If that number falls precipitously (or goes to 0), this report will help you to identify the problem. The usual culprits are DC step flagging, and common-mode rejection.

1 comment:

Antonio said...

Hi Ed,

Thanks for this. It is very helpful. I am confused by one aspect that I would like to have clarified in my mind.

You say that BADDA and BADBOL apply to and defined as entire bolometers. In the initial step this makes sense as we lose bolometers that have been switched off in the set up or that have failed the flatfield. Couldn't we have BADFLA for the latter and report the sum with BADDA as the total number of lost bolometers?

Your post implies that further losses of samples are also recorded in BADBOL which is what I find confusing and may perhaps be the wrong way of reporting this. You imply that rejections from COM, DCJUMP and SPIKE are also included by BADBOL and hence from then on you are referring to an effective number of bolometers rejected. I don't know what you or others feel, but I would prefer that BADBOL tell us how many actual bolometers have been lost. With SPIKE and DCJUMP we do not lose the whole bolometer but a (hopefully) small fraction of the time slices written out by that bolometer.

These lost samples could be reported by resurrecting BADSAM - but you probably don't want to use a 9th bit! However, you could use your final summary at the bottom to give the total number of bolometers rejected by that iteration and the total number of additional time samples (which you can then also express as an equivalent number of bolometers if you wish, as I grant you, it is an interesting number!).