Pipelines and Archives: cadc

Showing posts with label cadc. Show all posts

2009-11-05

Watching the data reduce

Thanks to the folks at CADC, Dustin Jenkins in particular, JAC now has a really nice interface that allows us to monitor the jobs submitted to their Grid Engine, look for faults and browse the thumbnails of the products in order to spot problems - or just sit back and admire the results :-)

2009-10-02

Automated advanced processing at CADC

As previously mentioned, the ORAC-DR data reduction pipeline could be run at CADC to generate basic nightly products. This processing used to be run at JCMT on a nightly basis, with the products transferred to CADC.

This processing is now being run at CADC on their processing system. Processing requests are made at 0900 HST every day, and nightly products will be available sometime after that (depending on the amount of processing needed -- scans take longer to reduce than jiggles).

In the near future, effort will be undertaken to process the backlog of ACSIS data.

2009-09-03

Thumbnails in search results

As mentioned previously, the pipeline generates little thumbnails based on the representative image and the representative spectrum products. Now, CADC can show these thumbnails in search results. This hopefully will allow people to quickly identify search results of interest prior to downloading. Clicking on the thumbnails will launch the full preview of the representative image or spectrum as before.

If you have recent data, check it out. We hope to re-reduce the backlog at some point in order to generate products and thumbnails for older data too.

2009-07-13

Database Replication outage

Over the weekend we lost database replication to CADC. Until we can sync up the tables we are unable to transfer any raw data to CADC (since CADC only accept files that their systems know about) so data will not be retrievable from the weekend. I'll post an update when transfers are enabled again.

UPDATE: Replication server crashed on Friday night. It's now back up and the tables have been synced with CADC. Transfers have been restarted.

2009-07-07

CADC network outage

From John Ouellette at CADC:

The CADC will be undergoing extensive maintenance from July 18th 0800 PDT to July 19th 1800 PDT. All CADC services, including user access and etransfer, will be unavailable during this period.

During the outage, users will be redirected to a web page stating the reason for the outage and, if possible, we will provide a status update during the work.

2009-06-03

Pipeline now generates thumbnails

The data reduction pipeline now automatically generates PNG thumbnails of rimg and rsp files. These thumbnails are generated in three different sizes, 64x64, 256x256, and 1024x1024. Exif information is also written to these thumbnails, embedding the RA, Dec, source name, orientation, and pixel scale. Only the astro: namespace is currently used (see this ROE page for more information).

Here's a (rather boring) example from last night:

In the near future they will be sent to CADC for automatic ingest.

2009-05-12

Pipeline running on CADC grid engine

Last week I (Tim J) visited CADC to work on integrating ORAC-DR into the CADC grid engine system. This involved making sure that the pipeline wrapper interfaced properly with CADC and that the data retrieval and data capture routines were given the correct inputs.

On Wednesday 6th May we were successful in running four jobs in parallel on the compute cluster. This is a terrific result and leads the way to being able to do the night processing on CADC in short order and then to follow up with processing of project coadds. In the next few weeks I will be working on the code that will query the data archive and submit job requests to grid engine.

It also means that in principal survey teams could request that jobs be submitted to grid engine to make use of the processing nodes.

2009-05-11

CADC network outage

CADC and therefore JCMT data retrievals will be off the air on Wednesday afternoon (HST) due to scheduled network maintenance. If you are having trouble getting through, just try again later.

2008-12-19

Near minutely raw data movement

Raw observation data is being put both in jcmt database & CADC staging area not much later when it arrives.

A new program[0] endlessly checks (currently) about every minute to see if there are any observations to process. If there are, the database is fed, followed by symbolic link creations for CADC's consumption. This should help avoid massive data transfers to CADC twice a day. Note that previously involved programs will keep running concurrently until everybody involved is satisfied that raw data is being entered and/or moved as desired.

All this started yesterday slightly wet, cloudy Hawaiian evening.

[0] enterdata-cadc-copy.pl is a wrapper around JSA::EnterData & JSA::CADC_Copy modules, which were respectively generated from jcmtenterdata.pl & cadcopy.pl.

2008-02-29

Preliminary Wrapper script released

A first stab at a wrapper script (jsawrapdr) has been released to users in Hilo. It matches the interface specified by CADC.

What it does:

retrieve data files from the supplied list
convert them to NDF if required
determines the correct ORAC-DR instrument name based on the data
checks that PRODUCT information matches for all files
determines whether to run ORAC-DR or PiCARD
converts products back to FITS

What it doesn't do yet:

Provenance is not quite correct. It is possible to refer to a parent that will not be archived.
There is no standardised approach to logging Standard Output and Standard Error
dpCapture does not automatically copy products to the CADC transfer directory.

It's enough to get us started.

CADC data transfers now working again

Data transfers to CADC are functioning again. We've had real problems reconfiguring replication to CADC and to our standby server (which use different techniques) but now everything seems to be working. Now that CADC have headers from recent observations they will again start accepting our raw data. Transfers have been initiated and are currently complete to 20080215 (there are quite a few files to transfer). Data retrieval requests from users via the OMP will shortly be redirected back to CADC.

2008-02-25

ORAC-DR: CADC+batch mode

A brief mind-dump on the processing steps ORAC-DR will probably take at CADC when run in batch mode:

_cube files created. These are then forgotten about by ORAC-DR but are stored by CADC.
Run initial steps of Remo's script on time-series data. Removes any gross time-series signal through collapsing and rudimentary linear baselining.
Run MAKECUBE using every member observation of a Group, creating tiles.
Run remainder of Remo's script on each tile, which uses a combination of smoothing and CUPID to create baseline region masks and to remove baselines.
Take the baseline region mask from the previous step along with the original input time-series data, and throw them through UNMAKECUBE. This will create time-series masks.
Apply the time-series masks to the original time-series data.
Run MFITTREND with a high-order polynomial (or spline, or whatever) on the masked time-series data. These cubes shouldn't have any signal and should be pure baseline.
Subtract the baselines determined in the previous step from the original time-series data.
Run MAKECUBE on the baselined time-series data for each observation to create the _reduced / _rimg / _rsp files.
If necessary, WCSMOSAIC the _reduced files for each observation to create an "even better" group, which can then be used to determine a better mask and then possibly iterate through the UNMAKECUBE to _reduced generation steps.

The Wrapper

Background: there is a system called, imaginatively, the wrapper. Its purpose is to wrap the data processing specifics so as to present a generic interface to the CADC data processing system that is under development.

The wrapper is on TimJ's to-do list, and so is at the mercy of his higher priority SCUBA-2 work. In an attempt to push something out to CADC before working on the SCUBA-2 translator, he is writing a prototype with the following functionality:

has a stub dpRetrieve, emulating the system that will eventually fetch the data needed from the CADC database
examining the data to determine whether it is raw or already a product
converts any FITS files to NDF
runs ORAC-DR or PiCARD as appropriate given the above information
converts any NDF products back to CADC-compliant FITS
calls a stub dpCapture (the real dpCapture imports any products into the CADC system)

The main problem that stops this from being more than a prototype is the provenance system. In our NDF based systems provenance is a time series - file A turns into file B which turns into file C.... eventually resulting in file E, the final product. So the provenance looks like this: A, B, C, D, E. In the CADC system, provenance is the nearest parent existing in the archive. So, if only A, B, D and E exist in the database (because C happens to be an intermediate file of no lasting importance) the provenance for E is D, but the provenance for D is B. Therefore, the wrapper has to make sure that at the end of any processing the provenance is correctly fixed to display only parents existing in the CADC archive.

The intended solution for this is for DavidB to commit some NDG patches to allow TimJ/the wrapper to remove C (in the previous example) from the provenance. Also, the wrapper needs to rename A, B, D and E to the CADC naming convention so they can be matched to entries in the archive.

So as not to hold other parts of the project up, the intent is for the prototype to be delivered to CADC in the next few days without this provenance-related functionality, and to come back and fix this when the SCUBA-2 translator work allows.

2008-02-13

OMP to CADC connection is down

At the tail end of last week we had a hardware failure at the summit with our primary OMP database server. We switched to the new Sybase 15 64-bit servers but they have not been configured correctly to replicate the JCMT header table to CADC (full database replication is working to the backup 64-bit server in Hilo). Until we get the CADC replication up and running there will be no transfers of raw data to CADC. This is because CADC validates transfers against it's copy of the header table and rejects observations that are unknown to them.

We hope to have replication running by early next week but in the mean time we have reconfigured OMP data retrievals to serve the raw data files from JAC.

2008-01-28

DB replication to CADC

Some confusion as to where we are with getting replication from the ASE 15 system to CADC. Phone call with anubhav, timj and isabella tomorrow (Tuesday) 12pm HST.

OMP ACSIS data retrievals

Last week we had a strange problem where data could be retrieved from the OMP from November onwards but between Feebruary and October 2007 data retrievals failed because the OMP sent the wrong filenames to CADC. Turned out that the new test database (running Sybase 15) had been loaded with data up to end of October and that was triggering a new logic path through the OMP. Usually, the OMP failed to find any entries in the database and fell back to looking on the data disk for raw data. This always works and always finds the right files. When rows are found in the database there is no need to look on disk (the database is much faster than looking at files) and once the test database was initialised the DB lookups were working properly. The only problem was that the query to the ACSIS database did not return the filename information from the FILES table and therefore the OMP was forced to guess the filename. For data taken since we renumbered the subystem numbers the guess was wrong and CADC were asked to serve files that didn't exist.

I fixed the problem last week and now retrievals work with database and file lookup. I was able to cleanup quite a lot of code in the process and the Astro::FITS::HdrTrans module was made a little cleverer and can now tell the difference between a database result and a header read from a file. Apologies to people who experienced retrieval problems over the past 2 weeks.

2008-01-25

ACSIS wrapper script

Short telecon arranged for Monday 12:30 HST (after regular JSA 12:00 HST CADC team meeting) so that Tim can put forward his ideas (mostly to Sharon) about the ACSIS processing wrapping script. There is some interest in testing this by February 7th, which will be tight.

Pipelines and Archives

2009-11-05

Watching the data reduce

2009-10-02

Automated advanced processing at CADC

2009-09-03

Thumbnails in search results

2009-07-13

Database Replication outage

2009-07-07

CADC network outage

2009-06-03

Pipeline now generates thumbnails

2009-05-12

Pipeline running on CADC grid engine

2009-05-11

CADC network outage

2008-12-19

Near minutely raw data movement

2008-02-29

Preliminary Wrapper script released

CADC data transfers now working again

2008-02-25

ORAC-DR: CADC+batch mode

The Wrapper

2008-02-13

OMP to CADC connection is down

2008-01-28

DB replication to CADC

OMP ACSIS data retrievals

2008-01-25

ACSIS wrapper script

Subscribe via email

Who are you?

Useful Links

Blog Archive

Contributors