The Pipeline Steps

The three main steps of the pipeline are

  • identifying all sources in all images using Source Extractor
  • cross-matching all sources between all images using SCAMP
  • separating the SSOs from other sources such as stars, galaxies, artifacts, using a filter chain

SExtractor

SExtractor identifies sources in CCD images using the pixel values and outputs catalogue data. Parameters like the pixel positions, sky coordinates, instrumental magnitudes and more are retrieved for each source. Refer to the official documentation and the Guide to SExtractor for much better explanations.

SExtractor is highly configurable using a configuration file. The config file ssos.sex that is provided with this survey differs slightly from the default version default.sex, specifically the deblending and photometry parameters were adjusted to better deblend SSOs close to other sources. The path to this config file has to be set in the default.ssos file using the SEX_CONFIG parameter. Likewise, SEX_PARAMS has to point to the output parameter file, by default semp/ssos.param.

The SExtractor configuration as set in ssos.sex requires two more files, the convolution filter set via the SEX_FILTER parameter and the neural network for star-galaxy differentiation, set via the SEX_NNW keyword, located in semp/gauss_2.5_5x5.conv and semp/default.nnw respectively.

Unless the path parameters above are set using absolute paths, the pipeline will look for the files in the directory it is executed in.

Before running SExtractor, all input images are checked for valid WCS keywords in their headers. As pre-existing projection parameters interfer with running SCAMP, they are removed by creating a temporary copy of each image and editing the header inplace. The original images are not edited.

Finally, using the SCI_EXTENSION, the user has to provided the index of the science extension of the FITS image. It is common case that data is provided in multi-extension FITS format, where besides the science data also weight images and other supplementary data is stored in the FITS file. Therefore, the user has to specify the science extension. Multiple extensions are allowed as well, if for example different CCD images were stored in the same file. The valid values are integers, separated by commas if multiple extensions should be analysed, e.g. 0, 1, or 1,2.

If you are unsure which extension contains your image, you can trial run SExtractor with the following syntax and check the output catalogues:

sex -c semp/ssos.sex image_file.fits[SCI_EXTENSION]

Note

Searching multiple extensions at the same time only makes sense if the field-of-views overlap. Otherwise, running the pipeline on the extensions separately will yield better results.

The SEXtractor output catalogues will be called image_file_$SCI_EXTENSION.cat [1]. After the SExtractor run, the input images are checked for the MJD-OBS header keyword. If it does not exist, the DATE-OBS keyword is read in, converted to MJD, and saved in an .ahead file with the same filename structure as the SExtractor catalog. This additional header file is important for the subsequent SCAMP run.

SCAMP

SCAMP provides the astrometric solution for the pipeline: The SExtractor catalogues from the first step sharing the same field-of-view (FoV) are matched against each other using a reference catalogue, and the source coordinates are solved for translation, rotation, and distortion of the original images. This way, sources detected in several images over different epochs can be traced from one image to another, their detections are linked. Again, find a better explanation in the official documentation.

Much like SExtractor, SCAMP is highly configurable and the supplied configuration file ssos.scamp has to be linked to using the SCAMP_CONFIG in the default.ssos file.

Once SCAMP has matched the SExtractor catalogues, it creates among others two catalogues, the full_1.cat and the merged_1.cat. The former contains all source detections of all images, given positions in pixel and sky coordinates, fluxes, etc., while the latter lists the properties of the merged (linked) detections, i.e. proper motion and other properties of all sources which were traced over several images. Both catalogues will be used in the subsequent analysis, specifically the full catalogue, as it holds the important information on the movement of the sources over time.

Note

The SExtractor and SCAMP runs are the computationally most challenging parts of this pipeline and therefore the bottlenecks in execution time. To allow for quick pipeline runs in order to find the optimal settings, the script checks for the existence of the output catalogues before running the software. If the catalogues already exist, these steps are skipped. This behaviour can be overruled by setting the --sex, --scamp, and --swarp flags in the pipeline call.

The reference catalogue used by SCAMP can be used to identify stars and galaxies in the images. Removing these sources from the catalogues before SCAMP associates the source detections over epochs can increase the chance of succesful detections. Setting the REMOVE_REF_SOURCES parameter to true will execute SCAMP twice: First, with a cross-match radius of 1”, linking all stars and galaxy detections (and any other source detection within that radius). Sources which were matched a certain number of times and to a source in the reference catalogue are then flagged in the SExtractor catalogues. Finally, the astrometric solution derived by SCAMP is added to the .ahead files. SCAMP is then re-run, with the astrometric solution provided and the cross-match radius as set in the configuration file. Only transient sources should be included in the pattern matching sample.

Filter Chain

All filter steps are optional and can be turned on/off and adjusted via the default.ssos configuration file.

Filter by Number of Detections

Setting: FILTER_DETECT | Parameters: DETECTIONS

All sources with a number of detections equal to the numbers specified in the DETECTIONS parameter are removed. By default, DETECTIONS is 123, removing all sources with fewer than 4 detections. Removing sources with only 1 or 2 detections is always recommended, as their motion cannot be judged. It is, however, not enforced by the pipeline.

As a rule of thumb, artifacts such as random CR associations tend to have fewer detections than SSOs, which in turn have fewer detections than stars and galaxies. Increasing the required number of detections is an effective way to clean the sample, though at the cost of possibly losing faint SSOs and SSOs in the edge regions of the images.

Filter by Proper Motion Range

Setting: FILTER_PM | Parameters: PM_LOW, PM_UP, PM_SNR

All sources with proper motions lower than PM_LOW and larger than PM_UP are rejected. Furthermore, the lower limit of the signal-to-noise ratio (SNR) of the proper motion measurement performed by SCAMP can be set using PM_SNR. SCAMP performs a linear fit of the source coordinates over time to determine the proper motion. Large uncertainties signal sources which do not move with constant proper motions, as expected from SSOs.

Effectively, the SNR lower limit introduces a lower limit on the proper motion as well. If the proper motion of an SSO over the exposure time is within order of the seeing conditions, it will exhibit large fluctuations in position and therefore be assigned a large error in the proper motion measurement by SCAMP.

Filter by Bad Pixel

Setting: FILTER_PIXEL | Parameters: DELTA_PIXEL

If all detections of a single source fall within the same pixel DELTA_PIXEL range (both XWIN_IMAGE and YWIN_IMAGE parameters), the source is rejected. Bad CCD pixel can be falsely interpreted as sources by SExtractor and SCAMP. Due to the dithering patterns, they appear to move perfectly linear and with a constant proper motion. SExtractor parameters like DETECT_MINAREA can be used to clean these sources, but increasing the minimum pixel area per source can also reject faint SSOs. The filter chain therefore also offers this rudimentary bad pixel rejection.

Filter by Motion

Setting: FILTER_MOTION | Parameters: IDENTIFY_OUTLIER, OUTLIER_THRESHOLD, R_SQU_M

The motion filter is the most effective and strictest filter. A linear fit is applied to both the RA and the DEC coordinates against observation epochs. If the R^2 goodness-of-fit parameter of both fits is equal or larger than the user-defined R_SQU_M parameter (0 <= R^2 <= 1), the source is accepted. If either fit is not within the limit, the source is rejected. If R_SQU_M is between 0.95 and 1, this imposes very strict rules on the motion. Slow moving SSOs (proper motion in the order of seeing) might be missed if R_SQU_M is too big, while a lower setting will increase the number of artifacts surviving the pipeline.

The filter is effective in sorting out stars and galaxies from the sample, as they are stationary over the period of time, and the centroid position found by SExtractor will randomly fluctuate within the order of the seeing.

Problems arise when the observations span multiple hours or nights. If the survey images for example cover one area of the sky for the whole night with 50 exposures, it may occur that an SSO is observed in the first and the last 5 exposures. Such a long baseline with no observations in between will almost always yield a perfect linear fit. The same is true for sources randomly associated by stars, e.g. two stars close together or a star and several CRs. Again, the linear motion filter will be fooled by the large baseline of observations. To tackle this problem, the IDENTIFY_OUTLIER option was introduced. If True, the motion filter starts by detecting outliers in epoch-space within the detections of one source. This is achieved using the Median Absolute Deviation (MAD) of the observation epochs E.

Further, we use SCAMPs flagging of non-linearly moving sources to reject candidates: If a source has two or more detections in total or at least 20% of its detections flagged as 64, it is rejected.

\[\mathrm{MAD} = \mathrm{median}(|E_{i} - \mathrm{median}(E)|)\]

This calculates the median duration between one observation and the median observation epoch. The median is not affected by outliers, therefore it can be used to identify jumps in the epochs. If the time difference between any two observations is larger than MAD*OUTLIER_THRESHOLD, the source detections are split into subgroups. If more than one of the jumps is found, the detections are split into several subgroups. As long as the number of detections in each subgroup is larger or equal to 3, the detections within the subgroup are then checked for linear motion by the fitting procedure described above. If any subgroup fails the linear motion test, the source is discarded. If a subgroup has fewer than 3 detections, it is only discarded if the other subgroup fails the linear motion test or if all other subgroups do not contain the sufficient amount of observations either.

All source detections which were identified as outliers in epoch space get +1 added to their FLAGS_SSOS parameter. If a source contains “only outliers” (e.g. two pairs of two detections with a large gap in between), the source is removed.

Todo

Add figures of fits, outliers, subgroup fits

Note

The linear motion filter requires the detection filter both to be active and to filter at least all sources with 1 or 2 detections.

Filter by Trail Consistency

Setting: FILTER_TRAIL | Parameters: RATIO

Assuming roughly constant exposure time and seeing conditions, the SSO trail in the images should not vary in size. This is expressed by the AWIN_IMAGE and BWIN_IMAGE SExtractor parameters, which are the semi-major- and semi-minor axes of the ellipse fitted to the source. Varying size parameters indicate an association of random sources (e.g. cosmic ray + star). This filter compares the standard deviation of both AWIN_IMAGE and BWIN_IMAGE of all detections of one source against the weighted average uncertainty,

\[\mathrm{\texttt{RATIO}} = \frac{\overline{\sigma_{w}} }{ \sigma_{x}}, \qquad x~\epsilon~\{\verb|AWIN_IMAGE|, \verb|BWIN_IMAGE|\}\]
\[\overline{\sigma_{w}} = \Big( \sqrt{\sum_i w_{i,x}} \Big)^{-1}\]
\[w_{x} = \frac{1}{\sqrt{var_{x}}} \qquad var~\epsilon~\{\verb|ERRAWIN_IMAGE|, \verb|ERRBWIN_IMAGE|\}\]

and removes sources which show standard deviations larger than the RATIO parameter allows for.

Note

By default, this filter is disabled.

Filter by Bright Sources

Setting: FILTER_BRIGHT_SOURCES | Parameters: DISTANCE, MAG_LIMITS, BRIGHT_SOURCES_CAT

Bright sources tend to introduce numerous artifacts like refraction spikes and reflection ghosts into images. As the position of these artifacts depends on the camera geometry and pointing, they tend to follow the dithering pattern and display linear movement over all observation epochs. Sources close to bright stars therefore tend to contain a large fraction of these artifacts, and can be rejected with this filter. T The reference catalogue can be set to REFCAT, meaning that the SCAMP reference catalogue is used. In this case, it is retrieved to a local file. Alternatively, a path to a local file can be provided, such as the HYG database. The DISTANCE parameter sets the radius around bright stars in arcsecond where all sources are cleared from. The MAG_LIMITS setting applies upper and lower magnitude limits on the sources in the catalogue before the filtering.

The local reference catalogue has to be either in CSV and FITS file format. The columns have to be named RA, DEC (both in degree), and MAG. If BRIGHT_SOURCES_CAT is set to REFCAT, no further adaptations are necessary.

Optional Analyses

SkyBoT Cross-match

Setting: CROSSMATCH_SKYBOT | Parameters: CROSSMATCH_RADIUS, OBSERVATORY_CODE, FOV_DIMENSIONS

Query the SkyBoT database for SSOs in the field-of-view defined by FOV_DIMENSIONS and the centre coordinates of each exposure for each observation epoch. The query result is saved as skybot/query_string.XML file. The positions of all SSO candidates are then compared to the predicted positions of known SSOs, and if a match is found within the CROSSMATCH_RADIUS (in arcsecond), the predicted SkyBoT parameters are added to the source metadata in the database.

The SkyBoT matching is performed on a detection-basis: each single source detection is cross-matched and the closest SkyBoT match is saved.

If more than one known SSO is associated with an SSO candidate, the measured and predicted proper motions are compared, and the match with the smallest difference in the proper motion angle is chosen. Source detections with no match will carry the name and designation of the SSO associated with the remaining detections, while the remaining SKYBOT fields will be empty.

The FOV_DIMENSIONS parameter has to be defined as described on the SkyBoT webpage, a string of format “YxZ”, where Y and Z are the image dimensions (integer or floating value) in degree.

Cutout Extraction with SWARP

Setting: EXTRACT_CUTOUTS | Parameters: SWARP_CONFIG, CUTOUT_SIZE

Use SWARP to create cutout images with dimension size CUTOUT_SIZE in pixel of each SSO detection. The cutouts are saved in the format cutouts/SOURCE_NUMBER__CATALOG_NUMBER.fits. Using e.g. imagemagick, these cutouts can be quickly turned into little movies for visual confirmation of their nature. The SWARP_CONFIG file is used to configure the cutout extraction.

Compute Fixed Aperture Magnitudes

Setting: FIXED_APER_MAG | Parameters: REFERENCE_FILTER, CUTOUT_SIZE

To measure SSO colours, the magnitudes in different bands using fixed apertures has to be computed. In the mandatory SExtractor part of the pipeline, the magnitudes are measured with variable Kron-apertures. This step uses the cutout images of SSOs to apply SExtractor in dual-image mode: One exposure is used to detect objects and compute the apertures, whereas the other is used for flux measurements. The detection image should be the deepest exposure available for best results. This band can be chosen using the REFERENCE_FILTER parameter, which has to be equal to the FILTER keyword of the detection image. As not all source candidates are necessarily observed in this band, the value can be set to multiple bands, separated by commas. The script will then prioritize the source detections according to the order specified in this value, e.g. FILTER1,FILTER2,FILTER3. The source detection which was chosen as reference detection this way gets flagged by adding 2 to the FLAGS_SSOS output parameter.

After the fixed aperture magnitudes are calculated, the columns MAG_CI and MAGERR_CI are added to the database.

If the cutout extraction with SWARP was set to False, the cutouts will be created in this step and saved to a temporary folder, which is deleted after the pipeline finishes.

Create checkplots

Setting: CHECKPLOTS | Parameters: SKYBOT_PM, SKYBOT_RESIDUALS

Setting the CHECKPLOTS value to either parameter or a comma-separated list of them will create the checkplots in case any object was matched with a known asteroid.

Convert to MPC submission format

The ssos pipeline output can be converted to the MPC 80-column format using

$ ssos --mpc path/to/output/csv

Flags

The FLAGS_SSOS parameter is used to highlight sources which pass the filter but might be artifacts. An example are sources with jumps (outliers) in their observation epochs, which fools the linear motion filter. The flag values are represented by powers of 2 and added together, allowing for multiple flags to be set at the same time. The flag values are:

Integer Value Meaning
1 Source detection is an outlier in EPOCH
2 Source detection used as reference for fixed aperture magnitude measurement
[1]Appending the [SCI_EXTENSION] bit after .cat confuses the popular TOPCAT tool, so consistency in naming was neglected here.