Pipeline Configuration

You can change attributes in the user_configuration.yml file to run PAL 2.0 according to your custom choice. The attributes for setting up the pipeline are given below:

Directory Attributes

run_folder
- Address of the cloned repository on your system ending with a /, preceeded by the home anchor
- Example: C:/Users/MatDisc_ML/
- This is a required value, there is no default value
output_folder
- List with home reference as first element and address of directory to save outputs of Bayesian Optimization ending with a / as second element
- Values = Any string literal for the second element
- Default value = bo_output/

General Attributes

test_size
- Fraction of the unobserved materials space that needs to be explored by Bayesian Optimization.
- Example: If we have a total of 100 materials, then test_size = 0.9 implies, we will use 10 materials to train our surrogate models initially and then explore the remaining 90 based on Bayesian Optimization.
- Values = Floating point in (0,1)
- Default value = 0.9
verbose
- Set to True if we want to print the progress of the code and Bayesian Optimization iterations without too much detail regarding the fitting process. It mainly prints out the outputs of the Bayesian Optimization iterations.
- Values = True or False
- Default value = True
deep_verbose
- Set to True if we want to print the progress of the code and Bayesian Optimization iterations with all details including training of the Gaussian Process models, prior means and Bayesian Optimization iteration outputs.
- Values = True or False
- Default value = False

Input Attributes

dataset_folder
- Folder name suffix to save the Bayesian Optimization output for a given data.
- The standard format of the output folder name is: dataset_folder + test_size + p_Run + Run number
- Values = Any string literal
- Default value = newDataset
InputType
- The format in which your dataset is stored.
- Values = Gryffin, PerovAlloys, PALSearch, MPEA
- Default value = Gryffin
- Given below is a table which shows the various input types and their associated file extensions:
  
  InputType
  
  File Extension
  
  Gryffin
  
  .pkl
  
  PerovAlloys
  
  .csv
  
  PALSearch
  
  .xls, .xlsx
  
  MPEA
  
  .xls, .xlsx
InputPath
- List with home reference as first element and address for where the dataset is saved ending with a / as second element.
- Values = Name of the directory where the dataset is stored
- Default value = datasets/
InputFile
- Name of the dataset file.
- Values = Filename of the dataset being used
- Default value = perovskites_GRYFFIN.pkl
AddTargetNoise
- Set to True if we want to add a small Gaussian noise to the target property
- Values = True or False
- Default value = False

Feature Selection Attributes

test_size_fs
- Fraction of the data to be used to do feature selection.
- In the case of running Bayesian Optimization, this needs to be set the same as the test_size variable mentioned earlier.
- Values = Floating point in (0,1)
- Default value = 0.1
select_features_otherModels
- Set to True if we want to do feature selection of input descriptors for all models other than Gaussian Process - Neural Network model.
- Values = True or False
- Default value = True
select_features_NN
- Set to true if we want to do feature selection of input descriptors for the Gaussian Process - Neural Network model.
- Values = True or False
- Default value = True
random_state
- This is used to set the seed to dividing the dataset into train and test sets for feature engineering.
- Values = Any Real Positive Number
- Default value = 40
onlyImportant
- Set to true if we want to output only features selected from the list of input features.
- Values = True or False
- Default value = False

Surrogate Models Training Attributes

train_NN
- Set to True if we want to train the Neural Network model initial before using the Neural Network as a prior mean to fit the Gaussian Process model,
- Values = True or False
- Default value = True
saveModel_NN
- Set to true if we want to save the Neural Network model in a file after fitting.
- This has to be set to True if we are training the Neural Network model with the given initial data for the first time.
- Values = True or False
- Default value = True
train_GP
- Set to True if we want to train the Gaussian Process models
- Values = True or False
- Default value = True
predict_NN
- Set to True if we want to use the Neural Network model to do predictions
- Values = True or False
- Default value = False

InputType	File Extension
Gryffin	.pkl
PerovAlloys	.csv
PALSearch	.xls, .xlsx
MPEA	.xls, .xlsx