Pipelines and Archives

2009-05-12

Pipeline running on CADC grid engine

Last week I (Tim J) visited CADC to work on integrating ORAC-DR into the CADC grid engine system. This involved making sure that the pipeline wrapper interfaced properly with CADC and that the data retrieval and data capture routines were given the correct inputs.

On Wednesday 6th May we were successful in running four jobs in parallel on the compute cluster. This is a terrific result and leads the way to being able to do the night processing on CADC in short order and then to follow up with processing of project coadds. In the next few weeks I will be working on the code that will query the data archive and submit job requests to grid engine.

It also means that in principal survey teams could request that jobs be submitted to grid engine to make use of the processing nodes.

2009-05-11

CADC network outage

CADC and therefore JCMT data retrievals will be off the air on Wednesday afternoon (HST) due to scheduled network maintenance. If you are having trouble getting through, just try again later.

2008-12-19

Near minutely raw data movement

Raw observation data is being put both in jcmt database & CADC staging area not much later when it arrives.

A new program[0] endlessly checks (currently) about every minute to see if there are any observations to process. If there are, the database is fed, followed by symbolic link creations for CADC's consumption. This should help avoid massive data transfers to CADC twice a day. Note that previously involved programs will keep running concurrently until everybody involved is satisfied that raw data is being entered and/or moved as desired.

All this started yesterday slightly wet, cloudy Hawaiian evening.

[0] enterdata-cadc-copy.pl is a wrapper around JSA::EnterData & JSA::CADC_Copy modules, which were respectively generated from jcmtenterdata.pl & cadcopy.pl.

2008-12-04

QA-enabled pipeline released in Hilo

ORAC-DR has been updated in Hilo to include quality-assurance testing. Based on a number of QA tests, observations are given a pass/questionable/fail status. QA is automatically done on all science observations, and survey-specific QA parameters can be given.

This version will eventually be released to the summit, where it will give telescope operators feedback on which surveys are suitable to do, along with enhancing the JCMT Science Archive pipeline.

2008-06-27

SCUBA-2 DR pipeline

A belated announcement that the SCUBA-2 data reduction pipeline passed its "lab acceptance" earlier this month. Full report at http://docs.jach.hawaii.edu/JCMT/SC2/SOF/PM210/04/sc2_sof_pm210_04.pdf

2008-04-01

Initial results of "better" ORAC-DR reduction

As previously mentioned, ORAC-DR is improving how ACSIS data are reduced. To show the progress between "summit" and "better":

integrated intensity map, group coadd: summit better

integrated intensity map, single observation: summit better

intensity-weighted velocity map, group coadd: summit better

A few notes:

By "summit" pipeline I mean the pipeline currently running at the summit. This pipeline will be replaced by an "improved" pipeline pending JCMT support scientist approval. The "improved" pipeline will not be run at CADC, they will run the "better" pipeline that created the "better" images linked above.

The "group" summit integrated intensity map is not generated by the pipeline, it's created by manually running wcsmosaic to mosaic together the individual baselined cubes (the _reduced CADC products), then collapsing over the entire frequency range. This is how the summit pipeline would create those files, though.

Ditto for the group summit velocity map, except the pipeline wouldn't even create those in the first place, as it doesn't know which velocity ranges to collapse over to get a proper velocity map. This example is just done by naively collapsing over the entire velocity range. The "better" pipeline automatically finds these regions and creates velocity maps -- not only for the coadded group cube, but also for individual observation cubes.

The difference between the "better" pipeline (which is what will be running at CADC) and the "improved" pipeline (which is what will be running at the summit) is very small for this given dataset.

2008-03-26

CUPID ClumpFind and backgrounds

Jenny Hatchell has been comparing the CUPID implementation of the ClumpFind algorithm with the IDL implementation by Jonathan Williams. The IDL version differs in one or two significant respects from the original algorithm published in ApJ, and so CUPID provide a switch that selects either the published algorithm or the IDL algorithm. If the IDL algorithm is selected, Jenny finds that the IDL and CUPID implementations allocate exactly the same pixels to each clump. Good news. And more good news is that the CUPID implementation is at least an order of magnitude faster than the IDL implementation.

However, Jenny noted that the clump sizes reported by CUPID were not the same as those reported by IDL. Both implementations use the RMS displacement of each pixel centre from the clump centroid as the clump size, where each pixel is weighted by the corresponding pixel data value. So in principle they should produce the same values. The difference turns out to be caused by the fact that CUPID removes a background level from each clump before using the pixel values to weight the displacements. IDL , on the other hand, uses the full pixel values without subtracting any background. Thus, increasing the background level under a clump will produce no change in the clump sizes reported by CUPID. IDL, however, will report larger clump sizes due to the greater relative emphasis put on the outer edges of the clump.

So should a background be subtracted or not? Having the reported clump size depend on the background level seems an undesirable feature to me. But if you want to compare CUPID results with other systems (e.g. the IDL ClumpFind in this case) that do not subtract a background, you CUPID also needs to retain the bacground level to get a meaningful comparison. Consequently, I've added a parameter to CUPID:FINDCLUMPS to select whether or not to subtract the background before calculating clump sizes. The default is for the background to be subtracted unless CUPID is emulating the IDL algorithm (as indicated by the ClumpFind.IDLAlg configuration parameter).

If the background is retained in CUPID, Jenny found that the CUPID and IDL clump sizes match to within half a percent. So things look OK.

David