Big Data Summer: A summer school of the BiGmax Network

Platja d’Aro, Spain, September 9 – 13, 2019



Gerhard Dehm
Max-Planck-Institut für Eisenforschung, Düsseldorf, Germany
Claudia Draxl
Humboldt-Universität zu Berlin, Berlin, Germany
Matthias Scheffler
Fritz Haber Institute of the Max Planck Society, Berlin, Germany
Jilles Vreeken
CISPA – Helmholtz Center for Information Security, Germany


Materials science is entering an era where the growth of data from experiments and simulations is expanding beyond a level that is addressable by established scientific methods. The so-called “4 V challenge” – concerning Volume (the amount of data), Variety (the heterogeneity of form and meaning of data), Velocity (the rate at which data may change or new data arrive), and Veracity (uncertainty of quality) is clearly becoming eminent. Issues are, for example, an early discrimination between valuable and irrelevant experimental data, understanding errors in both experiment and theory, and assigning error bars and trust levels to density-functional theory high-throughput screening results, just to name a few. Most importantly, however, is that Big Data of materials science provide a significant chance for new insight and knowledge gain when fully exploiting its information by artificial intelligence concepts and methods. All the above aspects – from data processing to exploiting the potentials of data-driven materials science – require new and dedicated approaches.

The school was predominantly targeted towards PhD students and young postdocs. The 15 invited speakers addressed important background and recent advances in data-driven materials science. The topics covered a wide spectrum to demonstrate the challenges and potential that research data offer, including:

  • FAIR principles of scientific data, including hardware aspects
  • introduction and frontiers of artificial intelligence
  • interpretability and causality in machine learning
  • various data-mining tools and mathematical concepts behind
  • data diagnostics
  • pattern discovery
  • real-time data processing of emerging experimental setups
  • metadata in computational and experimental materials science.

You can find slides from the invited speakers here:


Monday, September 9, 2019

15:00 Arrival – Coffee break
Session chair: Claudia Draxl
15:30 – 16:30 Matthias Scheffler Welcome and Introduction
16:30 – 17:30 Jilles Vreeken Material Subgroups
17:30 – 18:00 Break
18:00 – 19:00 Hans-Joachim Bungartz Research Data Infrastructures – How Generic Can & Should They Be?
19:30 Welcome Cocktail – Dinner


Tuesday, September 10, 2019

08:00 Breakfast
Session chair: Hans-Joachim Bungartz
09:00 – 10:00 Claudia Draxl The NOMAD Encyclopedia – A Tool for Exploring Computed Data
10:00 – 11:00 Dierk Raabe Big Data-Related Challenges in Microstructure Research and Alloy Design
11:00 – 11:30 Break
11:30 – 12:30 Siyuan Zhang Modern Electron Microscopy Goes High Dimension: Handling Big Data
12:30 – 12:50 Hot Topic Talk: Raabe (Atomic-Scale Imaging of Chemistry at Lattice Defects)
13:00 – 15:00 Lunch Break
Session chair: Matthias Scheffler
15:00 – 16:00 Joseph F. Rudzinski Data-Driven Methods for Soft Matter
16:00 – 16:20 Hot Topic Talk: Vreeken (Telling Cause from Effect)
16:20 – 16:50 Break
16:50 – 20:00 Poster Parade and Poster Session
20:00 Dinner


Wednesday, September 11, 2019

08:00 Breakfast
Session chair: Isao Tanaka
09:00 – 09:45 Markus Rampp High-Performance Data Analytics: Basic Concepts of Distributed Deep Learning
09:45 – 10:45 Karsten W. Jacobsen Machine Learning and Computational Screening
10:45 – 11:15 Break
11:15 – 12:15 Luca M. Ghiringhelli Metadata Towards FAIR Data Sharing for Data-Driven Materials Science
12:15 – 12:55 Hot Topic Talks:

–          Jacobsen (High Entropy Alloys for Catalysis)

–          Draxl (Benchmark Calculations Towards Ultimate Precision in Density-Functional Theory)

13:00 – 15:00 Lunch break
Session chair: Karsten W. Jacobsen
15:00 – 16:00 Cécile Hébert Data Challenges in Analytical Transmission Electron Microscopy: Size, Formats and Annotation
16:00 – 16:40 Hot Topic Talks:

–          Ghiringhelli (Identifying Interpretable Descriptors for Materials Properties with Subgroup Discovery and Information Theory)

–          Rudzinski (Variational Autoencoders for Dimensionality Reduction and Clustering of Molecular Dynamics Data)

16:40 – 17:20 Break
17:20 – 18:20 Annette Trunschke Big-Data Driven Catalysis Research: Challenges and Chances
18:20 – 18:40 Hot Topic Talk: Hébert (Machine Learning Techniques in Analytical TEM: Trends and Challenges)
20:00 Dinner


Thursday, September 12, 2019

08:00 Breakfast
Session chair: Stefan Bauer
09:00 – 10:00 Chiho Kim Polymer Informatics: Past, Present and Future
10:00 – 10:40 Hot Topic Talks:

–          Bauer (Learning Disentangled Representations)

–          Tanaka (Data Driven Discovery of New Materials)

10:40 – 11:10 Break
11:10 – 12:10 Luca M. Ghiringhelli Learning Descriptors for Materials Properties with Symbolic Regression and Compressed Sensing
12:10 – 12:30 Hot Topic Talk: Trunschke (Clean Data Acquisition in Oxidation Catalysis)
13:00  Lunch break
14:30  Excursion and Conference Dinner


Friday, September 13, 2019

8:00 Breakfast
Session chair: Matthias Scheffler
09:00 – 10:00 Stefan Bauer Recent Advances in Unsupervised Representation Learning
10:00 – 11:00 Isao Tanaka Recommender System for Materials Discovery
11:00 Concluding remarks                                                                                                                                                     




Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.