Report on the “ML-IP 2021” workshop (Young and Early-career Researchers’ Tutorial on Machine Learning Interatomic Potentials)

Machine learning potentials have now established themselves as a method of choice in many atomistic simulation projects. This tutorial workshop was aimed at young and early-career researchers who are interested in using machine learning potentials in their work, but are unsure of where to start or of how feasible the proposed application would be.

While the field continues to produce new theoretical and methodological advances, there is now a large class of systems that can be treated with existing, established methods. The main issues now for new researchers entering the field are, first, choosing between the many different machine learning methods (and correspondingly many software packages) available, and second, learning about simulation workflows and best practices that are often undocumented, unwritten “common knowledge”.

The workshop was designed with two main aims: First, to give these researchers a solid introduction in the basic scientific techniques of designing, fitting, and validating a machine learning potential for a new system. Second, to provide a platform for young researchers interested in using machine learning potentials in their work to connect to those involved in developing methods for machine learning potentials, in order to accelerate the adoption of machine learning techniques in the wider atomistic simulation community.

A screenshot from Zoom showing some of the workshop participants (3/3) — Smile! The workshop participants and speakers gather on Zoom for the first day of the workshop.

To meet these aims, the workshop was designed “for young researchers, by young researchers” – the speakers were selected mainly from researchers actively involved in both the theory and practice of using machine learning potentials, providing them a platform to pass on their own unwritten knowledge. They were asked to prepare comprehensive, one-hour tutorials consisting of some introductory slides as well as an interactive example, usually in the format of a Jupyter notebook (hosted on the Deepnote platform), in order to give participants both a theoretical grounding and practical experience in the speaker’s topic area.

For the initial edition of this workshop, we decided to limit the workshop to 40 participants, both to foster small-group discussions and to keep a large ratio of speakers to participants. The workshop announcement was publicized both on the Psi-k website and over Twitter, which helped its rapid diffusion. In the end, we received nearly 120 registration requests from over 20 countries all over the world, and in career stages from undergraduate to early PI. For the difficult task of participant selection, we prioritized motivation as well as willingness to learn and apply the workshop material in future research. The participants who were not selected were still given access to the recorded lectures and teaching materials created for the workshop.

The workshop was organized around four main themes, representing the main challenges encountered when fitting a machine learning potential for a new system:

An introduction and overview of the field
Structural representations and fitting methods
Configurational sampling, database building, and active learning
Validation and scientific applications

The talks and tutorials organized around these themes were given as Zoom meetings, with the tutorial notebooks hosted on the Deepnote platform, which also provided some cost-free computational resources that proved adequate for many of the tutorials.

A screenshot from Gather, showing workshop participants interacting virtually, including video chats with nearby participants — The poster awards ceremony, hosted on the Gather virtual platform

In addition to the talks, we also hosted two poster sessions and a social dinner on the Gather platform, which gave the participants a valuable opportunity to interact and discuss in small, informal groups, with each other and with the speakers as well. We were very happy to see a strong engagement by the speakers, many of whom attended the poster session and participated in lively discussions with workshop attendees. Finally, a Slack workspace (now open to the public) provided an additional opportunity for participants and speakers to ask questions, have offline discussions, and continue to build the community of young and early-career researchers in the field of machine learning potentials. We are delighted to see that some participants are already starting to use these methods in their own work, based on the discussions happening on this platform.

The feedback from the workshop participants, provided both over Slack during the workshop and in a form circulated afterwards, was overwhelmingly positive. Participants particularly valued the high proportion of early-career researchers, the ability to interact with the speakers, and the informal atmosphere of the workshop. Many participants also found the publicly available tutorial notebooks as especially helpful resources, with many continuing to work through the examples after the end of the workshop. All survey respondents said they felt motivated to start using ML potentials in their work after the workshop (40% “agree”, 60% “strongly agree”). We also received several suggestions for improvement in future editions of the workshop, in particular, a larger amount of time dedicated to the practical component of the tutorials, as opposed to the theoretical background talks, as well as for individual experimentation with the notebooks. Several participants also asked for more time to be spent on the topic of training set design and selection, an often-overlooked aspect of building a machine learning potential. Finally, several participants expressed a wish for a more uniform notebook format to ease the use of the tutorials. Overall, the positive response from this workshop’s participants strongly motivates another edition of the workshop in the future, as well as perhaps similarly structured workshops for other emerging topics in the field of atomistic simulation.

The Organizers

Max Veit
Elena Gelžinytė
Venkat Kapil
Felix-Cosmin Mocanu
Federico Grasselli