config module

The config module contains a class that holds the configuration variables used throughout the code.

class config.ConfigVars(test, kind=None, gridsearch=False, n_jobs=2)

Bases: object

Holds config variables. test (see Args) has to be passed on instantiation. The other three args are optional. A lot of other attributes are automatically set on instantiation (see Attributes). Note that some of the attributes are ‘generic’, and the CLASS in the string may be replaced in other scripts with e.g. ‘star’ or ‘galaxy’ etc.

Args:

test (bool): whether test run or not

kind (str): set to object type (star, galaxy, or qso)

gridsearch (bool): whether to run Random Forest (RF) gridsearch or not

n_jobs (int): no of cores to run on. Automatically set to half of available cores

Attributes:

test (bool): whether test run or not

kind (str): set to object type (star, galaxy, or qso)

gridsearch (bool): whether to run Random Forest gridsearch or not

n_jobs (int): no of cores to run on. Automatically set to half of available cores

base_path (str): base path reference for other paths

random_state_val (int): used for non-deterministic steps (PCA, RF)

catname (str): name of input catalogue (with labels)

catname_predicted (str): name of unsen catalogue for prediction

targetname (str): column name of catname that contains the true labels

classes (list): object types for classification

photo_band_list (list): list of photometric bands used to create the attributes (see combine_type)

combine_type (str): ‘subtract’ or ‘divide’ - sets how to combine the photometric bands (i.e. ‘subtract’ creates colours)

hclass_dict (dict): mapping of numeric values in targetname column to object type

hdbscanclass_dict (dict): numeric labels to be used for HDBSCAN predicted object types

plot_colours_dict (dict): object to colour dict used for plotting

RF_param_grid (list of dict): RF parameter grid to be used for RF gridsearch

RF_top_gridsearch_vals (list): top features from ranked RF importances to be used to select features to hdbscan gridsearch

ncomp_gridsearch_vals (list): number of dimensions to which to reduce using PCA for input to hdbscan gridsearch

min_cluster_size_gridsearch_vals (range / list): min_cluster_size values to try in hdbscan gridsearch

data_output (str): path name for data output

data_file_predicted (str): path to catalogue for prediction

RF_dir (str): path to RF output

hdb_dir (str): path to hdbscan output

sav_mod_dir (str): path to saved models

cons_dir (str): path to consolidation step output

pred_dir (str): path to prediction step output

data_file (str): path to catalogue for training data with labels

RF_best_params_file (str): path to best hyperparameters for RF from RF gridsearch

RF_best_metrics_file (str): path to best metrics for best RF from RF gridsearch

RF_importances (str): path to list of feature importances from RF

HDBSCAN_gridsearch (str): path to output file from hdbscan gridsearch (labels for each stup)

HDBSCAN_gridsearch_performance (str): path to summary of metrics from each setup from hdbscan gridsearch

dendrogram_file (str): path to dendrogram plot from best hdbscan setup

hdbscan_best_setups_file (str): path to summary file of best hdbscan classifier setup

PCA_dimensions_file (str): path to positions in PCA space for each datapoint in training dataset

saved_scaler_model_file (str): path to saved scaler model

saved_pca_model_file (str): path to saved PCA model

saved_hdbscan_model_file (str): path to saved hdbscan model

HDBSCAN_consolidation_summary (str): path to file with summary of classification performance

Catalogue_with_hdbscan_labels (str): path to catalogue with new hdbscan labels appended

hdbscan_best_labels (str): path to best labels for each hdbscan binary classifier

confusion_plots (str): path to confusion plots for consolidated labels

opt_colour_plot (str): path to colour plot of consolidated lables (using optimal consolidation)

alt_colour_plot (str): path to colour plot of consolidated lables (using alternative consolidation)

HDBSCAN_consolidation_summary_predicted (str): path to summary of labels for label prediction on unseen catalogue

Catalogue_with_hdbscan_labels_predicted (str): path to unseen catalogue with predicted labels

pred_colour_plot (str): path to colour plot for predicted labels on unseen catalogue

PCA_dimensions_file_prediction_data (str): path to positions in PCA space for each datapoint in prediction dataset