All posts by maxdveit

CECAM/Psi-k Flagship School on Machine Learning Interatomic Potentials for Young and Early-Career Researchers (ML-IP 2023)

Machine learning interatomic potentials (ML-IPs) have now established themselves as a key technique in atomistic modeling. They allow the simulation of many diverse types of systems, from the molecular to the solid state, at the accuracy of highly sophisticated electronic structure methods but at a greatly reduced cost. While the general methodology of training and validating a machine learning potential has been well established, many codes and integrated software applications exist to perform these tasks. Since many of these come with a high entry barrier, there is still a need to educate young and early-career researchers in these tasks, as well as provide a pathway to enter the field and make valuable contributions for researchers who have promising ideas that could benefit from the application of ML-IPs.

We organized the ML-IP 2023 school at Aalto University, Finland, from 6–10 November 2023 with the broader goal of educating young researchers working on machine learning for materials and molecules on diverse topics, including structural representations, fitting ML models for potentials as well as properties beyond the ground-state potential energy surface, dataset generation and curation, and software frameworks. This was done keeping in mind the applicants’ interest and familiarity with scientific applications, to facilitate the “on-boarding” into the field. This endeavor was supported by funding from both Psi-k and CECAM, aided by additional contributions from Aalto University Department of Chemistry and Materials Science as well as from EPFL’s COSMO laboratory through the ERC-FIAMMA grant.

Applications and participants

Owing to the broad interest in the field, we received an impressive number of applications, over 120 total for in-person participation alone. From this number, 40 were shortlisted to attend the meeting in person; up to 80 more applicants were selected for online participation. In the spirit of supporting early-career researchers, we prioritized those who could benefit the most from attending the workshop in person. To foster diversity amongst participants, both in terms of experience levels in the field and backgrounds, we gave preference to younger researchers (doctoral students and early postdocs) and selected applicants based on motivation and potential to learn from the workshop. While a good proportion of participants were women researchers and those from other traditionally underrepresented groups, we note that we are still far from fair representation of these groups (e.g. gender equality and good representation of non-European researchers), which is an ongoing issue in our research field that we all have a responsibility to address.

We note, especially given the current political climate, that visa issues hindered travel amongst several participants of non-European nationality, in a few cases resulting in the cancellation of their on-site participation, ultimately harming the workshop’s goals of open scientific exchange.

Workshop format

As the workshop was organized in a hybrid format, talks and hands-on tutorials were given by 13 invited speakers in-person at Aalto University, with 3 more speakers joining remotely. Tutorials were held at the end of each workshop day, with sufficient time for both the speaker to present their tutorial and the attendees to work through the tutorials on their own. Despite the preparation work done in advance, technical issues (especially joining and using the supercomputer infrastructure made available to attendees) did pose a barrier for many of the workshop attendees in these sessions.

All of the workshop attendees were provided the opportunity to present and discuss their own work at a poster session. Several presentation slots were also made available for the attendees, and the speakers for these slots were chosen based on the poster abstracts. The poster session was held both in person and online (on Gather Town). This dual format of the poster session provided another opportunity to facilitate the interactions among online and offline participants, although the overall participation in the online poster session was quite low. Nonetheless, enforcing the hybrid format enabled contributions from speakers who were otherwise unable to join the workshop to these sessions. Awards were presented for the three best contributed talks and three in-person poster presentations.

Finally, a Slack workspace served as an additional opportunity for participants and speakers to have discussions, pose questions, and continue to strengthen the community of young and early-career researchers in the field of machine learning potentials. Many of the speakers kindly participated in these interactions, which further catalyzed the learning process of the attendees.

The scientific program was complemented by social events that gave participants a chance for informal networking in a more relaxed setting, which was appreciated by many.

Feedback and future planning

To conclude the workshop, a focus session was held to receive feedback and suggestions for future editions of this workshop. The feedback from the workshop participants, obtained in-person and over Slack during the workshop and in a survey circulated after the workshop, was enthusiastically positive. Participants particularly valued the high proportion of early-career researchers, the ability to actively interact with the speakers, in an atmosphere that encouraged a flat hierarchy and interactions between researchers with different experience levels.

Most attendees considered the scientific level of the talks as appropriate overall, although some expressed the need for more in-depth and introductory lectures at the graduate level as opposed to the working-level knowledge of most practitioners in the field.
The tutorials, however, posed some issues, most of which were technical, starting from hassles with registration and accounts on the cluster to long queuing times on clusters, which significantly impacted the intended interactive nature of these sessions and left the participants with little time to fully engage with the contents of the tutorial and instructions from the speakers.

The online poster session was useful for some people, but the overall participation was quite low. In the future, requiring or strongly encouraging on-site participants to also add their posters online could help with this problem.

Due to the continuing growth of this field, the number of applicants who had to be turned away, and the interest explicitly expressed by attendees, we expect a future edition of this workshop to be well-received. In the discussion session on the “future of ML-IP” held at the end of the workshop, a need was expressed for a written guide to the organization of future events, as well as a need to rethink the online portion of the event, given the significant additional organizational effort needed to run the online part of the conference.

Key future improvements

Based on this participant feedback and our collective experience as organizers, we identified some key areas of improvement for the next editions of the workshop:

  • Increase the geographic diversity of participants (as the current demographic was mostly Europe-based). While the hybrid nature of the event helped expand the reach to other countries, this representation must also be reflected on-site, even while working under the current restrictions.
  • Actively encourage people from under-represented groups (women, people of color, and LGBTQ+ people, for example) to apply and participate, and additionally become involved in the conference organization. Also, begin gathering demographic data to support this goal.
  • Foster collaboration with industry by more proactively reaching out to industrial contacts for scientific contributions and sponsorship. This could also help support travel costs for those traveling from less-resourced countries.

Report on the “ML-IP 2021” workshop (Young and Early-career Researchers’ Tutorial on Machine Learning Interatomic Potentials)

Machine learning potentials have now established themselves as a method of choice in many atomistic simulation projects. This tutorial workshop was aimed at young and early-career researchers who are interested in using machine learning potentials in their work, but are unsure of where to start or of how feasible the proposed application would be.

While the field continues to produce new theoretical and methodological advances, there is now a large class of systems that can be treated with existing, established methods. The main issues now for new researchers entering the field are, first, choosing between the many different machine learning methods (and correspondingly many software packages) available, and second, learning about simulation workflows and best practices that are often undocumented, unwritten “common knowledge”.

The workshop was designed with two main aims: First, to give these researchers a solid introduction in the basic scientific techniques of designing, fitting, and validating a machine learning potential for a new system. Second, to provide a platform for young researchers interested in using machine learning potentials in their work to connect to those involved in developing methods for machine learning potentials, in order to accelerate the adoption of machine learning techniques in the wider atomistic simulation community. Continue reading Report on the “ML-IP 2021” workshop (Young and Early-career Researchers’ Tutorial on Machine Learning Interatomic Potentials)