Many science and engineering applications involve optimizing expensive-to-evaluate black-box functions over high-dimensional design spaces. Some canonical examples include design optimization over input space of candidate proteins, molecules, drugs, hardware architectures, and superconducting materials. One effective framework to solve such black-box optimization problems is Bayesian optimization where we iteratively query the black-box function’s evaluation for inputs recommended by a surrogate model, whose accuracy is continuously improved via learning from such online querying of input-output pairs. However, in many real-world scenarios where the overhead and cost of setting up experiments are prohibitively expensive (e.g., wet lab experiments that often require expensive materials and equipment), it becomes impractical to consider black-box optimization in the online setting. Instead, a more practical setting is to assume access to an existing database of previously collected input-output pairs aka offline dataset, and consider solving this problem in an offline manner. The goal of this tutorial is to present a comprehensive and structured survey of the relatively young area of offline optimization including different families of methods, theoretical developments, real-world applications, and open challenges.
The target audience of this tutorial includes
General AI researchers and graduate students who will learn about principles, algorithms, and outstanding challenges to explore the frontiers of black-box optimization from offline datasets and its real-world applications;
Industrial AI researchers and practitioners who will be apply the learned knowledge for solving design optimization problems;
Researchers and practitioners working on science and engineering applications such as drug/vaccine design, materials design etc. will learn about useful new tools.