2008-12-19
Near minutely raw data movement
A new program[0] endlessly checks (currently) about every minute to see if there are any observations to process. If there are, the database is fed, followed by symbolic link creations for CADC's consumption. This should help avoid massive data transfers to CADC twice a day. Note that previously involved programs will keep running concurrently until everybody involved is satisfied that raw data is being entered and/or moved as desired.
All this started yesterday slightly wet, cloudy Hawaiian evening.
[0] enterdata-cadc-copy.pl is a wrapper around JSA::EnterData & JSA::CADC_Copy modules, which were respectively generated from jcmtenterdata.pl & cadcopy.pl.
2008-12-04
QA-enabled pipeline released in Hilo
This version will eventually be released to the summit, where it will give telescope operators feedback on which surveys are suitable to do, along with enhancing the JCMT Science Archive pipeline.
2008-06-27
SCUBA-2 DR pipeline
2008-04-01
Initial results of "better" ORAC-DR reduction
- integrated intensity map, group coadd: summit better
- integrated intensity map, single observation: summit better
- intensity-weighted velocity map, group coadd: summit better
A few notes:
- By "summit" pipeline I mean the pipeline currently running at the summit. This pipeline will be replaced by an "improved" pipeline pending JCMT support scientist approval. The "improved" pipeline will not be run at CADC, they will run the "better" pipeline that created the "better" images linked above.
- The "group" summit integrated intensity map is not generated by the pipeline, it's created by manually running wcsmosaic to mosaic together the individual baselined cubes (the _reduced CADC products), then collapsing over the entire frequency range. This is how the summit pipeline would create those files, though.
- Ditto for the group summit velocity map, except the pipeline wouldn't even create those in the first place, as it doesn't know which velocity ranges to collapse over to get a proper velocity map. This example is just done by naively collapsing over the entire velocity range. The "better" pipeline automatically finds these regions and creates velocity maps -- not only for the coadded group cube, but also for individual observation cubes.
- The difference between the "better" pipeline (which is what will be running at CADC) and the "improved" pipeline (which is what will be running at the summit) is very small for this given dataset.
2008-03-26
CUPID ClumpFind and backgrounds
However, Jenny noted that the clump sizes reported by CUPID were not the same as those reported by IDL. Both implementations use the RMS displacement of each pixel centre from the clump centroid as the clump size, where each pixel is weighted by the corresponding pixel data value. So in principle they should produce the same values. The difference turns out to be caused by the fact that CUPID removes a background level from each clump before using the pixel values to weight the displacements. IDL , on the other hand, uses the full pixel values without subtracting any background. Thus, increasing the background level under a clump will produce no change in the clump sizes reported by CUPID. IDL, however, will report larger clump sizes due to the greater relative emphasis put on the outer edges of the clump.
So should a background be subtracted or not? Having the reported clump size depend on the background level seems an undesirable feature to me. But if you want to compare CUPID results with other systems (e.g. the IDL ClumpFind in this case) that do not subtract a background, you CUPID also needs to retain the bacground level to get a meaningful comparison. Consequently, I've added a parameter to CUPID:FINDCLUMPS to select whether or not to subtract the background before calculating clump sizes. The default is for the background to be subtracted unless CUPID is emulating the IDL algorithm (as indicated by the ClumpFind.IDLAlg configuration parameter).
If the background is retained in CUPID, Jenny found that the CUPID and IDL clump sizes match to within half a percent. So things look OK.
David
2008-02-29
Preliminary Wrapper script released
- retrieve data files from the supplied list
- convert them to NDF if required
- determines the correct ORAC-DR instrument name based on the data
- checks that PRODUCT information matches for all files
- determines whether to run ORAC-DR or PiCARD
- converts products back to FITS
- Provenance is not quite correct. It is possible to refer to a parent that will not be archived.
- There is no standardised approach to logging Standard Output and Standard Error
- dpCapture does not automatically copy products to the CADC transfer directory.
CADC data transfers now working again
2008-02-25
ORAC-DR: CADC+batch mode
- _cube files created. These are then forgotten about by ORAC-DR but are stored by CADC.
- Run initial steps of Remo's script on time-series data. Removes any gross time-series signal through collapsing and rudimentary linear baselining.
- Run MAKECUBE using every member observation of a Group, creating tiles.
- Run remainder of Remo's script on each tile, which uses a combination of smoothing and CUPID to create baseline region masks and to remove baselines.
- Take the baseline region mask from the previous step along with the original input time-series data, and throw them through UNMAKECUBE. This will create time-series masks.
- Apply the time-series masks to the original time-series data.
- Run MFITTREND with a high-order polynomial (or spline, or whatever) on the masked time-series data. These cubes shouldn't have any signal and should be pure baseline.
- Subtract the baselines determined in the previous step from the original time-series data.
- Run MAKECUBE on the baselined time-series data for each observation to create the _reduced / _rimg / _rsp files.
- If necessary, WCSMOSAIC the _reduced files for each observation to create an "even better" group, which can then be used to determine a better mask and then possibly iterate through the UNMAKECUBE to _reduced generation steps.
The Wrapper
The wrapper is on TimJ's to-do list, and so is at the mercy of his higher priority SCUBA-2 work. In an attempt to push something out to CADC before working on the SCUBA-2 translator, he is writing a prototype with the following functionality:
- has a stub dpRetrieve, emulating the system that will eventually fetch the data needed from the CADC database
- examining the data to determine whether it is raw or already a product
- converts any FITS files to NDF
- runs ORAC-DR or PiCARD as appropriate given the above information
- converts any NDF products back to CADC-compliant FITS
- calls a stub dpCapture (the real dpCapture imports any products into the CADC system)
The main problem that stops this from being more than a prototype is the provenance system. In our NDF based systems provenance is a time series - file A turns into file B which turns into file C.... eventually resulting in file E, the final product. So the provenance looks like this: A, B, C, D, E. In the CADC system, provenance is the nearest parent existing in the archive. So, if only A, B, D and E exist in the database (because C happens to be an intermediate file of no lasting importance) the provenance for E is D, but the provenance for D is B. Therefore, the wrapper has to make sure that at the end of any processing the provenance is correctly fixed to display only parents existing in the CADC archive.
The intended solution for this is for DavidB to commit some NDG patches to allow TimJ/the wrapper to remove C (in the previous example) from the provenance. Also, the wrapper needs to rename A, B, D and E to the CADC naming convention so they can be matched to entries in the archive.
So as not to hold other parts of the project up, the intent is for the prototype to be delivered to CADC in the next few days without this provenance-related functionality, and to come back and fix this when the SCUBA-2 translator work allows.
2008-02-13
OMP to CADC connection is down
2008-02-11
specx2ndf now creates variance
2008-02-04
Processing 3D cubes with FFCLEAN
1) process 3D cubes. It will do this either by processing the cubes as a set of independent 1D spectra, or as a set of independent 2D images (see new parameter AXES)
2) store the calculated noise level in the output variance array (see new parameter GENVAR)
This was motivated by my experiments with the new smurf:unmakecube command as a means of getting an estimate of the noise level in each residual spectrum.
2008-02-01
Creating artifical time series from a sky cube
2008-01-31
Time accounting for JLS
Another issue for JLS observing is with time accounting. How to account in the OMP the time for data that has been deemed unacceptable. These data do not get charged to the surveys so if the data were initially QUESTIONABLE retroactive action will need to be taken to correctly account for that time (notwithstanding the issue of shared calibrations...see later). So that we can track how much time has been REJECTed by each survey, there shall be a special project code (eg. MJLSG00) for each which we will use to charge REJECT observations to.
The idea is that when the obslog flag changes:
1. an email is triggered to ACC and PIs notifying the change
2. if the change is to REJECT then the release date is automatically changed to TODAY (or equivalent)
3. ACC runs up nightrep for the night in question and changes the time accounting accordingly.
4. changes to flags are propagated to CADC
In a situation where calibrations are provided by the observatory this system should work flawlessly (and a tool which takes care of the time accounting automatically would also be feasible). However, in the current situation where calibrations are shared amongst the projects, it is difficult to do the time accounting properly in this scheme as it is not immediately obvious how much calibration time should be taken with the REJECTed observation. It would have to be recalculated.
How to properly deal with BAD/QUESTIONABLE data within the JLS
The problem stems from what happens to questionable data which remains in some form of limbo until its status is deemed to be GOOD or BAD. Setting data to BAD is an issue in itself as such data are usually BAD because of a fault. However, the Legacy nature of JLS means that some data will be deemed to be unacceptable for the surveys and so should not be processed into Advanced Data Products. These data are not intrinsically BAD and so the plan is for them to be immediately released to the public.
We resolve this issue by having a new quality parameter in obslog - REJECT. Definitions:
GOOD: data is good and is processed by the pipeline
QUESTIONABLE: data may have problems with it - a human needs to look and make a decision on its quality
BAD: data is not good, do not process
REJECT: data does not meet agreed standards for survey
We don’t expect to be using the REJECT flag during normal PI observations. Furthermore, it is expected that with working QA in the survey pipelines the number of REJECT and QUESTIONABLEs will be small (but there will be enough, especially at the beginning as we're bedding the system, that we need to deal with them appropriately).
The following table summarises what happens to data with these flags:
_cube _reduced group proprietary
GOOD Y Y Y Y
BAD Y N N N
QUEST. Y Y N Y
REJECT Y Y N N
ADP charged VO/master product
GOOD Y Y Y
BAD N N N
QUEST. N Y N
REJECT N N Y
(N.B: QUESTIONABLE data should not be combined into the public VO product as those data are in an undefined quantum state and until their wave functions have collapsed into either of GOOD, BAD or REJECT you don’t know what to do with them)