YOUNG RESEARCHER’S WORKSHOP ON MACHINE LEARNING FOR MATERIALS 2022 09-13 May 2022, Trieste (IT) – Report

State of the Art and Workshop Objectives

Data-driven methods have emerged as a novel paradigm to advance materials discovery over the past decade. Machine learning potentials (MLPs) enable the sampling of trajectories with the same accuracy of high-level electronic structure methods but at a fraction of their cost. MLPs have been established as a means to rationalize puzzles previously unapproachable by atomistic simulations. Elsewhere, the chemical and physical properties of large chemical spaces are now screened in a high-throughput fashion by leveraging artificial intelligence methods, materials simulations, and automation protocols. The screening is not only viable for the case of known structures, but generative models can now autonomously generate previously-unseen, and tailored, molecules and crystals structures with a target property. Machine learning (ML) methods therefore serve as formidable surrogates to accelerate expensive computational screening, but also to guide experimental screening and extract knowledge from data gathered via high-throughput or from literature. Furthermore, the advances in the theoretical understanding of how machine learning algorithms work is demystifying and surpassing the vision of data-driven approaches as magic black-boxes.

This event built upon the state-of-the-art in the field of machine learning for materials in two ways. Firstly, it helped instruct the next generation of young researchers on the latest advancements in methods and applications of AI for material discovery through didactic lectures and hands-on tutorials. Secondly, the workshop promoted a discussion on the implications of the latest advancements in data-driven methods on the different sub-areas of Materials discovery, bringing together experts of different fields in the world of machine learning for materials and promoting cross-contamination of ideas and techniques.

Outcomes

The introductory part of the workshop paved the way to an overview of supervised and unsupervised methods which well represented the state of the art in machine learning for materials. A number of tutorials on publicly available open source and documented codes were offered to participants.

Many discussions during the workshop focused on the design of atomic descriptors for supervised tasks in materials science. Two techniques were highlighted by various speakers as top performers: atom-density (e.g., atomic cluster expansion (ACE)) representation, and equivariant learnt representations via message-passing networks (MPE(3)N). Both methods efficiently encode information about local atomic environments and allow for very accurate learning of atomic or structural properties (e.g., forces, energies, etc.). A discussion of a unified theory to reconcile the dichotomy between ACE and MPE(3)N was a recurring trend across invited and contributed speakers. State-of-the-art Pareto fronts of prediction speed-accuracy, and an analysis on the memory and time requirements for training were also often reported.

A second set of common themes and techniques related to the use of generative models. Their application ranged across disciplines: SMILES-based short-term memory recurrent neural networks were used to design drugs; cartesian coordinates and autoencoders were utilized to unbiasedly obtain equivariant representation for quantitative structure-activity relationships; classical descriptors and variational autoencoders were adopted to map states during dynamics and/or glass phenotypes.

A third set of common themes related to the use of machine learning methods to accelerate the first principles screening of material stability. The application of this method ranges from energy materials (e.g., perovskites) to molecular crystals (e.g., drugs) and leveraged uncertainty-driven methods to iteratively and accurately chart convex-hulls and establish thermodynamically stable phases.

A final class of major scientific points of discussion encompasses a broader spectrum of topics which enables to bridge complexity gaps between data models and experiments, as so to establish rational design paradigms from structure-property relationships. A heterogeneous list of techniques debated includes (but is not limited to): machine learning potentials for fast-and-accurate simulation of complex dynamics; transfer learning to derive universal predictors which work well across different chemistries; experimental characterization and manipulation of materials via data-driven optimization.

Overall, the need for cross-contamination of expertises emerged as a strong and resonant topic throughout the conference. During panel discussions, presentations, and face-to-face interactions, participants expressed the need to escape scientific bubbles and gather information about techniques, applications, and developments in fields adjacent to their own research. In this regard the presence of leaders in atomistic modeling, computer science, machine learning, experimentalists, and industry representatives enabled an interdisciplinary exchange of perspectives and experiences.

Our workshop was indeed specifically designed to address the needs for multi-disciplinary cross-contamination, and we received resounding feedback about how such an effort was successful. All (to our knowledge, and according to a currently ongoing survey) participants to the workshop, be it an invited speaker, an online attendee, a poster presenter, or a young researcher that attended their first conference on the topic, was largely positive about the structure, topics, and organization of the event.

Finally, all talks, tutorials, and panel discussions that took place during the workshop have been recorded and uploaded on Youtube, the ICTP website, and the conference’s website, thus making high-quality scientific content available to anyone.

Dissemination

The workshop allowed researchers in adjacent fields to meet each other, learn about the most recent advances of their colleagues, and network in a scientifically fertile environment. Moreover, the presence of an introductory school in the workshop allowed for young researchers that are new to the field to learn about potential applications of machine learning technologies to their area of interest, and to better appreciate the advancements presented by invited leading researchers during the “workshop” part of the conference. From the networking perspective, the event north star was indeed to enable the cross-fertilization of research networks, promoting the encounter and collaboration between domain experts.

From the computational perspective, tutorials used Google Colabs seamlessly. All the codes discussed were open-source, and all tutorials presented during the workshop are available on the conference’s website for anyone to follow.

The computational expense associated with the machine learning for materials codes described is mostly related to data-generation. In this case data from the literature were utilized. A discussion on how to develop accessible and efficient machine learning codes which do not necessarily necessitate expensive computational architectures (e.g., GPU highly parallel facilities) has been put forward. Similarly, a reflection on the need to push open-science (open code, open data, etc.) to ensure the democratization of the field has been discussed. While there exist open repositories and robust generation routine for computational data (e.g., Materials Cloud, IoChemBD, NOMAD, Materials Project, AFLOW ), a discussion on how to promote the creation of FAIR compliant routines and databases for experiment-related data and codes was initiated

Scientific, Technologic, and Societal Impact

The discovery of novel materials for catalytic applications, energy storage, diagnostics, and therapeutics is one of the key ways in which the goal of a sustainable and equitable development can be reached (see also UN Sustainable Development Goals (SDGs) )

The recent years have seen a surge in the development and application of machine learning technologies in materials science. The initial results are extremely promising, with applications ranging from the design of novel catalysts for CO2 reduction to the exploration of the chemical space of energy materials for novel Lithium-free batteries. While giant steps have been made within the design of algorithms and the understanding of the theoretical backbone of machine learning in materials and chemical science, widespread and large scale applications are just starting to bloom and to have a real-world impact, allowing, e.g., for the discovery of novel stable materials or the design of never-seen-before catalysts or drugs.

This workshop pushed forward research in these critical fields by providing both a way to spread research advancements to young researchers, and a way to initiate collaboration between widely renowned scientists of different fields. This workshop further equipped the next generation of scientists (in academia and industry alike) with skills in tackling the complex problems related to high-performance materials design.

The presence of industry representatives (Roche, Bayer, Microsoft Research, AIndo) on one hand offered an overview of possible career pathways to participants. From an alternative perspective, the state-of-the-art methods and achievements obtained by our community were promoted to these R&D teams.