2010-10-26

Save time and memory: down-sampling your data

I've finally gotten around to adding the ability to "down-sample" the data.

SCUBA-2 data are normally sampled at about 200 Hz (actually closer to 180 Hz). This specification was based on the desire to scan up to 600 arcsec/sec while still fully-sampling the 450um beam. With these rates, a single sample covers an angular scale on the sky of 600/200 = 3 arcsec, which corresponds to approximately the 450 FWHM/3.

For most of the observations taken as part of SRO, however, typical scan rates were in the range 100--200 arcsec/sec, meaning that SCUBA-2 samples faster than necessary. We can then, in theory, re-sample the data into longer samples without losing any useful information. SMURF can now do this by specifying the smallest scale you wish to sample in the configuration file:

450.downsampscale = 2

850.downsampscale = 4

These values are measured in arcsec, and by setting to these values you will match the default pixel resolution used by SMURF in each band. It will then determine internally, based on the previous sample rate and mean slew velocity, how much to downsample the data.

The following example is the default reduction of an Orion map at 450um (20100219_00039) using the dimmconfig_bright_extended.lis configuration file:



And then again, except setting 450.downsampscale = 2:



As you can see there is very little difference between the two images.

As makemap is running you will notice:

smf_grp_related: will down-sample from 174.6 Hz to 60.0 Hz

Indicating that the data are being compressed by a factor of about 3. In terms of execution time on my 8-core machine it drops from 18 s to 11 s. The reason it is not a full factor of 3 is because there is a large initial start time as the original data (before down-sampling) are loaded in. Finally, the estimated memory usage drops from 457 MiB to 251 MiB (again, not a full factor of 3 due to other static memory that needs to be loaded).

This test uses a very small data set. The improvements are more pronounced for long scans in which the total memory usage is dominated by the time-series data, and not other statically allocated buffers (such as the map etc.). There will also be larger improvements at 850 um than at 450 um (because it is lower resolution), and for slower scan speeds.

No comments: