Project 28:
The Impact of Dataset Size on Bayesian Optimization, Insights from the QM9 Dataset
The “Chihuahuas” team’s research focuses on establishing the critical threshold of dataset size for achieving reliable results with Bayesian optimization for the QM9 dataset. Our study aims to discern the minimum dataset volume necessary for dependable optimization outcomes, a question of paramount importance in fields where data may be scarce or costly to acquire. By systematically examining how Bayesian optimization performs across varying dataset sizes we intend to offer insights into the optimal use of limited data resources. This endeavor is crucial for maximizing the efficacy of computational methods in scenarios where the dataset size is constrained, ensuring that Bayesian optimization remains a viable and effective tool for advancing research and applications in deep learning for chemistry and beyond.
References:
- Anatole von Lilienfeld and Kieron Burke. “Retrospective on a decade of machine learning for chemical discovery”. In: Nature Communications 11.1 (Sept. 2020). DOI: 10.1038/s41467- 020- 18556- 9. URL: https: //doi.org/10.1038/s41467-020-18556-9.