config module¶
The config module contains a class that holds the configuration variables used throughout the code.
-
class
config.ConfigVars(test, kind=None, gridsearch=False, n_jobs=2)¶ Bases:
objectHolds config variables. test (see Args) has to be passed on instantiation. The other three args are optional. A lot of other attributes are automatically set on instantiation (see Attributes). Note that some of the attributes are ‘generic’, and the CLASS in the string may be replaced in other scripts with e.g. ‘star’ or ‘galaxy’ etc.
- Args:
test (bool): whether test run or not
kind (str): set to object type (star, galaxy, or qso)
gridsearch (bool): whether to run Random Forest (RF) gridsearch or not
n_jobs (int): no of cores to run on. Automatically set to half of available cores
- Attributes:
test (bool): whether test run or not
kind (str): set to object type (star, galaxy, or qso)
gridsearch (bool): whether to run Random Forest gridsearch or not
n_jobs (int): no of cores to run on. Automatically set to half of available cores
base_path (str): base path reference for other paths
random_state_val (int): used for non-deterministic steps (PCA, RF)
catname (str): name of input catalogue (with labels)
catname_predicted (str): name of unsen catalogue for prediction
targetname (str): column name of catname that contains the true labels
classes (list): object types for classification
photo_band_list (list): list of photometric bands used to create the attributes (see combine_type)
combine_type (str): ‘subtract’ or ‘divide’ - sets how to combine the photometric bands (i.e. ‘subtract’ creates colours)
hclass_dict (dict): mapping of numeric values in targetname column to object type
hdbscanclass_dict (dict): numeric labels to be used for HDBSCAN predicted object types
plot_colours_dict (dict): object to colour dict used for plotting
RF_param_grid (list of dict): RF parameter grid to be used for RF gridsearch
RF_top_gridsearch_vals (list): top features from ranked RF importances to be used to select features to hdbscan gridsearch
ncomp_gridsearch_vals (list): number of dimensions to which to reduce using PCA for input to hdbscan gridsearch
min_cluster_size_gridsearch_vals (range / list): min_cluster_size values to try in hdbscan gridsearch
data_output (str): path name for data output
data_file_predicted (str): path to catalogue for prediction
RF_dir (str): path to RF output
hdb_dir (str): path to hdbscan output
sav_mod_dir (str): path to saved models
cons_dir (str): path to consolidation step output
pred_dir (str): path to prediction step output
data_file (str): path to catalogue for training data with labels
RF_best_params_file (str): path to best hyperparameters for RF from RF gridsearch
RF_best_metrics_file (str): path to best metrics for best RF from RF gridsearch
RF_importances (str): path to list of feature importances from RF
HDBSCAN_gridsearch (str): path to output file from hdbscan gridsearch (labels for each stup)
HDBSCAN_gridsearch_performance (str): path to summary of metrics from each setup from hdbscan gridsearch
dendrogram_file (str): path to dendrogram plot from best hdbscan setup
hdbscan_best_setups_file (str): path to summary file of best hdbscan classifier setup
PCA_dimensions_file (str): path to positions in PCA space for each datapoint in training dataset
saved_scaler_model_file (str): path to saved scaler model
saved_pca_model_file (str): path to saved PCA model
saved_hdbscan_model_file (str): path to saved hdbscan model
HDBSCAN_consolidation_summary (str): path to file with summary of classification performance
Catalogue_with_hdbscan_labels (str): path to catalogue with new hdbscan labels appended
hdbscan_best_labels (str): path to best labels for each hdbscan binary classifier
confusion_plots (str): path to confusion plots for consolidated labels
opt_colour_plot (str): path to colour plot of consolidated lables (using optimal consolidation)
alt_colour_plot (str): path to colour plot of consolidated lables (using alternative consolidation)
HDBSCAN_consolidation_summary_predicted (str): path to summary of labels for label prediction on unseen catalogue
Catalogue_with_hdbscan_labels_predicted (str): path to unseen catalogue with predicted labels
pred_colour_plot (str): path to colour plot for predicted labels on unseen catalogue
PCA_dimensions_file_prediction_data (str): path to positions in PCA space for each datapoint in prediction dataset