Pipeline Configuration
You can change attributes in the user_configuration.yml file to run PAL 2.0 according to your custom choice.
The attributes for setting up the pipeline are given below:
Directory Attributes
- run_folder
Address of the cloned repository on your system ending with a /, preceeded by the
homeanchorExample: C:/Users/MatDisc_ML/
This is a required value, there is no default value
- output_folder
List with
homereference as first element and address of directory to save outputs of Bayesian Optimization ending with a / as second elementValues = Any string literal for the second element
Default value = bo_output/
General Attributes
- test_size
Fraction of the unobserved materials space that needs to be explored by Bayesian Optimization.
Example: If we have a total of 100 materials, then test_size = 0.9 implies, we will use 10 materials to train our surrogate models initially and then explore the remaining 90 based on Bayesian Optimization.
Values = Floating point in (0,1)
Default value = 0.9
- verbose
Set to True if we want to print the progress of the code and Bayesian Optimization iterations without too much detail regarding the fitting process. It mainly prints out the outputs of the Bayesian Optimization iterations.
Values = True or False
Default value = True
- deep_verbose
Set to True if we want to print the progress of the code and Bayesian Optimization iterations with all details including training of the Gaussian Process models, prior means and Bayesian Optimization iteration outputs.
Values = True or False
Default value = False
Input Attributes
- dataset_folder
Folder name suffix to save the Bayesian Optimization output for a given data.
The standard format of the output folder name is: dataset_folder + test_size + p_Run + Run number
Values = Any string literal
Default value = newDataset
- InputType
The format in which your dataset is stored.
Values = Gryffin, PerovAlloys, PALSearch, MPEA
Default value = Gryffin
Given below is a table which shows the various input types and their associated file extensions:
InputType
File Extension
Gryffin
.pkl
PerovAlloys
.csv
PALSearch
.xls, .xlsx
MPEA
.xls, .xlsx
- InputPath
List with
homereference as first element and address for where the dataset is saved ending with a / as second element.Values = Name of the directory where the dataset is stored
Default value = datasets/
- InputFile
Name of the dataset file.
Values = Filename of the dataset being used
Default value = perovskites_GRYFFIN.pkl
- AddTargetNoise
Set to True if we want to add a small Gaussian noise to the target property
Values = True or False
Default value = False
Feature Selection Attributes
- test_size_fs
Fraction of the data to be used to do feature selection.
In the case of running Bayesian Optimization, this needs to be set the same as the test_size variable mentioned earlier.
Values = Floating point in (0,1)
Default value = 0.1
- select_features_otherModels
Set to True if we want to do feature selection of input descriptors for all models other than Gaussian Process - Neural Network model.
Values = True or False
Default value = True
- select_features_NN
Set to true if we want to do feature selection of input descriptors for the Gaussian Process - Neural Network model.
Values = True or False
Default value = True
- random_state
This is used to set the seed to dividing the dataset into train and test sets for feature engineering.
Values = Any Real Positive Number
Default value = 40
- onlyImportant
Set to true if we want to output only features selected from the list of input features.
Values = True or False
Default value = False
Surrogate Models Training Attributes
- train_NN
Set to True if we want to train the Neural Network model initial before using the Neural Network as a prior mean to fit the Gaussian Process model,
Values = True or False
Default value = True
- saveModel_NN
Set to true if we want to save the Neural Network model in a file after fitting.
This has to be set to True if we are training the Neural Network model with the given initial data for the first time.
Values = True or False
Default value = True
- train_GP
Set to True if we want to train the Gaussian Process models
Values = True or False
Default value = True
- predict_NN
Set to True if we want to use the Neural Network model to do predictions
Values = True or False
Default value = False