Current applications of BO in materials research relied on basic molecular representations such as simple composition of elements. However, in drug discovery and more complicated materials discovery, a myriad of molecular featurizations (or fingerprints) has been proposed for the representation of molecules. The majority of publications [1,2] followed the convention of using extended-connectivity fingerprint (ECFP). However, the impact of these representations on BO performance was still largely under explored.

The present project aims to investigate how different molecular representations can interfere with the performance on BO using a published quantitative-structure property relationship (QSPR) dataset.

Check out our social media post on LinkedIn!

References:

  1. H. Bellamy, A. A. Rehim, O. I. Orhobor, and R. King, Batched Bayesian Optimization for Drug Design in Noisy Environments, J. Chem. Inf. Model., vol. 62, no. 17, pp. 3970–3981, Sep. 2022.
  2. D. Reker and G. Schneider, Active-learning strategies in computer-assisted drug discovery, Drug Discovery Today, vol. 20, no. 4, pp. 458–465, Apr. 2015.