The BO Hackathon for Chemistry and Materials ‘24 has concluded. See below for teams, project descriptions, videos, and GitHub repositories for each of the projects below.

  • Project 1: Multi-objective Benchmarking of Dragonfly against BoTorch

    We will investigate the performance of Dragonfly against state of the multi-objective BO approaches. Dragonfly has found somewhat popularity in the community, but we want to provide evidence that the latest BoTorch solutions are possibly more optimal.

  • Project 2: Long-run Behaviour of Multi-fidelity Bayesian Optimisation

    Based on our recent workshop paper, we will investigate the causes of long-run MFBO failure.

  • Project 3: Take Your Time - Improving Optimization Performance Through Greater Investment in ACQF Optimizer Runtime

    This project investigated the performance tradeoff of investing computational resources into acquisition function optimization. We demonstrated the impact of random seed initialization on optimization campaign performance and devised a simple algorithm, Random Retries, to mitigate and improve the consistency and performance of Bayesian optimization on difficult optimization problems.

  • Project 4 : SimpleGPT-BO, Simplified GPT-Powered Bayesian Optimization

    SimpleGPT-BO is an intuitive tool designed to democratize the power of ChatGPT-driven Bayesian Optimization, making it accessible and straightforward for both beginners and experienced users alike. The idea is simply to utilize MyGPT platform provided by ChatGPT to build BO tool, by providing the generated instructions and codes, to allow...

  • Project 5 : Comparing Bayesian Optimization Methods Across Multiple Hyperparameters Against Simulated "Human" Decision-making

    This project will focus on trying to simulate the decision-making of a human researcher using a Bayesian Optimization framework, then comparing the performance across different, improved hyperparameters. By exploring these differences, this project aims to understand the strengths and weaknesses of Bayesian Optimization relative to the decision-making of a researcher...

  • Project 6: Multi-Objective Bayesian Optimization for Transparent Electromagnetic Interference Shielding with Thin-Film Structures

    We will investigate the problem of transparent electromagnetic interference shielding to protect electronic circuits or devices by finding an optimal nano-structure using Bayesian optimization. We will parameterize a thin-film structure considering the material and thickness of each layer, and then optimize two objective functions with mulit-objective Bayesian optimization. In addition,...

  • Project 7 : BayBE One More Time - Exploring Corrosion Inhibitors for Materials Design

    This project focuses on exploring the capabilities of Bayesian optimization, specifically employing BayBE, in the discovery of novel corrosion inhibitors for materials design. Initially, we work with a randomly chosen subset from a comprehensive database of electrochemical responses of small organic molecules for aluminum alloys. Our goal is to assess...

  • Project 8: BO for Drug Discovery-What is the role of molecular representation?

    Current applications of BO in materials research relied on basic molecular representations such as simple composition of elements. However, in drug discovery and more complicated materials discovery, a myriad of molecular featurizations (or fingerprints) has been proposed for the representation of molecules. The majority of publications [1,2] followed the convention...

  • Project 9: Optimizing The CO2 Uptake of Metal-Organic Frameworks Using Thompson Sampling

    Metal-organic frameworks are nanoporous materials that shows great promise for carbon capture at large scale. In this work, we adopt the CRAFTED MOF dataset and build Bayesian optimization model with Thompson sampling acquisition function to perform candidate selection for MOfs with high CO2 uptake. We benchmark Thompson sampling against random...

  • Project 10: Navigating the black box of zeolite synthesis with Bayesian Optimization

    Despite their massive industrial importance as catalysts, ion exchanger and adsorbent, zeolite synthesis still mostly relies on heuristics, experience and a sprinkle of magic. The parameter space is vast, comprising continuous variables (concentration of reagents, temperature, synthesis time…) and categorical variables (choice of precursor salts…). Different objectives are required depending...

  • Project 11: BlendDS - An intuitive specification of the design space for blends of components

    We will design “BlenDS” a framework for an intuitive specification of the design space for blends of components. The framework will be based on the following concepts: there are two entities, the “Component” and the “Blend”; they can be recursively combined in a tree-like structure to create the final tree...

  • Project 12: Robust GPs for Sustainable Concrete via Bayesian Optimization

    We will provide a tutorial on how to use Robust GPs (https://arxiv.org/abs/2311.00463) for the Sustainable Concrete via Bayesian Optimization.

    Meta at the end of 2023 published their research on “Sustainable Concrete via Bayesian Optimization”. Altamarino et al. published their work on “Robust and Conjugate Gaussian Process Regression” at the...

  • Project 13: Interpretability of Bayesian Optimisation Campaigns

    Bayesian Optimisation in Synthetic Biology is an emerging topic. Traditionally, classical DoE by design gives some insight into the interaction of input variables such as vitamins or trace materials. In a BO campaign, we can efficiently navigate that high dimensional input space, but we are not getting a direct insight...

  • Project 14: Bayesian optimization of likely negative candidates in imbalanced biological datasets

    Available peptide datasets often lack class balance due to experimental and technical challenges of the high-throughput screening methods in identifying negative examples, which limits the effectiveness of machine learning (ML) models trained on these datasets. A promising solution involves exclusively leveraging positive examples. This method, known as positive-unlabeled (PU) learning[1],...

  • Project 15: Adaptive Batch Sizes for Bayesian Optimization of Reaction Yield

    Challenge:

    Our goal is to design a Bayesian Optimization (BO) framework to optimize chemical reactions towards maximal reaction yield. We assume that each possible experiment (independent of which one) takes a fixed time (i.e. 1 hour) if performed individually, but experiments can be combined in batches. The model is retrained...

  • Project 16: BOPE-GPT, Preference Exploration with the curious AI chemist

    When facing a chemistry problem with many outputs, you might want to optimise one output first, like the yield of a reaction, or look for trade-offs for several outputs in a multi-objective fashion. In the latter, objectives/outputs may not be equally important (even across iterations) and very often, a human...

  • Project 17: Comparative Analysis of Acquisition Functions in Bayesian Optimization for Drug Discovery

    This project investigates the comparative analysis of various acquisition function methods on the efficiency of Bayesian Optimization (BO) in the drug discovery process, particularly focusing on small, diverse, unbalanced, and noisy datasets. The study will evaluate the impact of different acquisition functions, molecular featurization methods, and applicability domain (AD) across...

  • Project 18: Investigation of Multi-Objective Bayesian Optimization of QM9 Dataset

    This project will investigate the application of multi-objective Bayesian optimization (specifically EHVI- and parEGO-based methods) to benchmark several multi-objective optimization tasks with QM9 dataset. The objective is to develop specific guidelines about the choice of surrogate and acquisition functions in the context of Multi-Objective Bayesian Optimization for molecular property optimization.

    ...
  • Project 19: Quantum Bayesian Optimization for Automatic Chemical Design

    we introduce a novel approach to chemical design by implementing a Quantum Bayesian Optimization (QBO) method, enhancing the data-driven continuous representation of molecules. Leveraging the foundation of traditional BO used for exploring chemical spaces, our QBO framework integrates quantum computing principles to refine the optimization process. Through quantum parallelism, the...

  • Project 20: Closed loop optimization of hydrogel formulations using dynamic light scattering

    Hydrogels are hydrophilic crosslinked polymer networks used for cell culture, drug delivery, agriculture and tissue engineering. Depending on the application (eg 3D printing or injection), hydrogels require different mechanical and rheological properties, with companies such as Millipore-Sigma already selling hydrogels with custom stiffness. The hydrogel formulation - usually water, polymer...

  • Project 21: Benchmarking MolDAIS

    Recent works [1,2] are increasingly turning towards active encoding of molecular feature spaces. The motivation behind active encoding is that a priori encodings may not exhibit a smooth response to an arbitrary molecular property, reducing the performance of sample-efficient optimization algorithms, such as Bayesian optimization.
    This project will...

  • Project 22: Chemical Similarity-Informed Earth Mover’s Distance Kernel Bayesian Optimization for Predicting the Properties of Molecules and Molecular Mixtures

    The default distance function used in the kernel functions of Gaussian Processes is the Euclidean distance. The configuration of these kernels significantly influences the model’s performance. In this work, we will develop a series of chemical similarity-informed custom kernels for Bayesian Optimization to predict the properties of molecules. Additionally, we...

  • Project 23: Reliable Surrogate Models of Noisy Data

    Research and development in lab settings necessarily results in imperfect data collection. Noise is introduced into R&D-scale datasets from a number of possible factors, such as human error in measurement, equipment malfunctions, contamination, calibration errors, changing analytical methods, imperfect reproducibility between scientists, and environmental factors (e.g., temperature, humidity).

    Due to...

  • Project 24: ScattBO Benchmark - Bayesian optimisation for materials discovery

    A self-driving laboratory (SDL) is an autonomous platform that conducts machine learning (ML) selected experiments to achieve a user-defined objective. An objective can be to synthesise a specific material.[1] Such an SDL will synthesise a material, evaluate if this is the target material and if necessary optimise the synthesis parameters...

  • Project 25: Bayesian Optimized De Novo Drug Design for Selective Kinase Targeting

    This project employs Bayesian optimization using Gaussian process (GP) models with the Tanimoto kernel and fingerprint features for de novo design of selective growth factor receptor (GFR) inhibitors. GP surrogate models of docking scores will drive optimization of a docking-based objective function balancing potent target binding and minimal off-target interactions,...

  • Project 26: Multiple-Context Bayesian Optimization

    Traditionally, Bayesian Optimization (BO) is performed for a specific optimization task, e.g., for optimizing a cell culture medium for a specific cell type. If the medium is to be optimized for a different cell type, a new, uncorrelated optimization campaign is started. In multi-context BO (which could also be referred...

  • Project 27: How does initial warm-up data influence Bayesian optimization in low-data experimental settings?

    Real-world experiments in chemistry and materials science often involve very small initial datasets (10-100 data points). In this project, we propose to investigate how the 1) size and 2) distribution of the initial dataset influence the performance of bayesian optimization algorithms. We propose experiments on molecular property optimization tasks.

    Check...

  • Project 28: The Impact of Dataset Size on Bayesian Optimization, Insights from the QM9 Dataset

    The “Chihuahuas” team’s research focuses on establishing the critical threshold of dataset size for achieving reliable results with Bayesian optimization for the QM9 dataset. Our study aims to discern the minimum dataset volume necessary for dependable optimization outcomes, a question of paramount importance in fields where data may be scarce...

  • Project 29: A Bayesian Approach to Predict Solubility Parameters

    The critical role of solubility spans numerous domains, affecting everything from liquid miscibility and polymer stability to solid adsorption. This phenomenon’s accurate and swift prediction can significantly advance a variety of industries, including organic semiconductors, paint coatings, pharmaceuticals, and every chemical synmthesis. However, the challenge in predicting solubility lies in...

  • Project 30: Active learning for voltammetry waveform design

    This project will detail the design of fast voltammetry waveforms for neurochemical detection using Bayesian optimization. Fast voltammetry is conventionally used to detect neurochemicals in the brain. However, the voltammetry waveform of choice is an underappreciated source of information content due to a lack of design principles and intractable design...

  • Project 31: A tutorial on ask/tell mode for Ax

    This tutorial is aimed at experimentalists in wet lab settings, who have data workflows in an ask/tell format. We use real-world data from voltammetry to demonstrate setting up such a workflow. We cover de inner workings of Ax at a begginer-friendly level. We hope this tutorial will streamline the dry-lab...

  • Project 32: Efficient Protein Mutagenisis using Bayesian Optimization

    This project focuses on developing a Bayesian optimization workflow for protein mutagenesis enhancing protein binding affinity. Using predictions generated by the model such as the one outlined in the study from from Rube et al. (Nat. Biotech., 2022), our approach aims to optimize protein mutagenesis for biologics and enzyme engineering...

  • Project 33: Bayesian Optimization for Hyperspectral Co-heritability Search

    Genomic prediction models are often used to predict which crops may be good selections for the next round of breeding. This is done to improve plant resilience to pests and climate change and increase yield. Our interest is in finding good proxy data for training genomic prediction models. The proxy...

  • Project 34: Streamlining Material Discovery - Bayesian Optimization in Thermal Fluid Mixtures

    In this project, our objective is to evaluate active research areas in Bayesian Optimization (BO) that are applicable to thermal fluids. We aim to utilize BO to improve the process of discovering new mixtures while also developing a methodology for fine-tuning hyperparameters, taking into account constraints on thermal fluid properties....

  • Project 35: Tutorial for GAUCHE - A Library for Gaussian Processes in Chemistry

    This project involves creating tutorials for GAUCHE. See Input Warping Bayesian Optimisation Over Molecules.

    References:

    1. Griffiths, Ryan-Rhys, Leo Klarner, Henry Moss, Aditya Ravuri, Sang Truong, Yuanqi Du, Samuel Stanton et al. “Gauche: A library for Gaussian processes in chemistry.” Advances in Neural Information Processing Systems 36 (2024).
    ...
  • Project 36: Scalable Nonmyopic Bayesian Optimization in Dynamic Cost Settings

    Bayesian optimization is a widely used approach for making optimal decisions in uncertain scenarios by acquiring information through costly experiments. Many real-world applications can be cast as instances of this problem, ranging from designing biological sequences to conducting ground surveys. In these contexts, the cost associated with each experiment can...

  • Project 37: The Effects of Post-Modelling Performance Metric Computation on the Efficiency of Bayesian Optimizers

    Introduction

    Bayesian optimization is often considered a black-box technique, however several simple modifications to the design can vastly increase the optimizers computational and iterative performance on certain tasks. The modification explored here relates to what aspects of a problem are actually modelled during the optimization procedure. Raw data collected...

  • Project 38: Benchmarking Bayesian Symbolic Regression

    This project investigates the use of Bayesian methods in symbolic regression for use in the physical sciences. Symbolic regression (SR) is a machine learning approach that aims to obtain an analytical mathematical expression to fit a dataset, through optimizing the mathematical operations and coefficients within the expression. The interpretability of...

  • Project 39: Divide and Conquer - Local Gaussian Processes to design Covalent Organic Frameworks for Methane Deliverable Capacity

    In this project, we will explore the application and performance of local GP models in the Bayesian Optimization framework to maximize Methane Deliverable Capacity of COFs. The methane deliverable capacity is important as it amounts to the amount of natural gas that can be stored on board vehicles. We use...

  • Project 40: Optimizing Chemical Reaction Conditions with Multi-Agent Systems Using Large Language Models and Bayesian Optimization

    This project is focused on enhancing the efficiency of the Suzuki reaction process through an advanced multi-agent system, incorporating large language models (LLMs) and Bayesian Optimization (BO). The innovation lies in the employment of specialized sub-agents, each with expertise in a crucial domain of the reaction: catalyst design, solvent effects,...

  • Project 41: RAMBO I - Retrieval augmented initialization for Bayesian optimization strategy

    This project aims to incorporate literature knowledge to jump-start Bayesian optimization by finding relevant studies that match the design space of the reaction to optimize. By using Retrieval Augmented Generation (RAG) we can map the...

  • Project 42: Project 42 Mqs_bodoe

    Formulation design of pharmaceutical drug compounds with multi-component mixtures in combination with design of experiments via Bayesian Optimization and a automated lab setup.

    Define data schemas of interfaces between laboratory device, container algorithms (COSMO-SAC predictions) and Bayesian optimization of design of experiments.

    References:

    https://blog.mqs.dk/posts/10_cosmo/10_cosmo/

  • Project 43: Bayesian Optimization Awesome List

    We will work to create a GitHub Awesome List resource for Bayesian optimization.

  • Project 44: RBBO - rank-based Bayesian optimization

    Applying the use of ranking models for Bayesian optimization discovery of materials based libraries. Ranking models use a form of metric learning that aims to sort inputs rather than directly predicting a regressive label. This changes the structure of the loss (comparing pairs and lists of inputs) and allows the...

  • Project 45: Bayesian Optimization for Generality

    This project focuses on benchmarks and algorithms for “generality-oriented” Bayesian Optimization (BO). Usually, BO works by identifying those parameters x that optimize a single objective $f(\textbf{x})$. However, in the natural sciences, problems often involve several related objectives {fi(x)}i=1n. Here, the aim is to find parameters $\textbf{x}$ that do well across...