Title: Emerging Technologies in Scientific Data Visualisation
Location: CECAM-IT-SISSA-SNS Node, in Scuola Normale Superiore (Pisa, Italy)
Webpage with list of participants, schedule and abstracts of presentations: https://www.cecam.org/workshop-1586.html
Dates: April 4, 2018 to April 6, 2018
Stefano de Gironcoli (International School for Advanced Studies (SISSA) and CNR-DEMOCRITOS IOM, Trieste, Italy)
Emine Kucukbenli (SISSA, Trieste, Italy)
Giordano Mancini (Scuola Normale Superiore, Pisa, Italy)
Monica Sanna (Scuola Normale Superiore, Pisa, Italy)
State of the art:
Visualisation allows us to tap into high-bandwidth cognitive hierarchies of our brains and allows us to process high densities of information at once. In the field of atomistic and molecular simulations, it is a key element to research: we use ball-and-stick figures to represent the simulation scenarios, graphs to recognize or communicate parametric relationships of equations. The “Big Data” trend gave rise to several projects with vast output of data, many data-driven approaches are being introduced. For instance, a new EU Center of Excellence, “NOMAD”, is established to collect, store and regularize data to build a materials encyclopedia.
Visual analytics is also making its way into material simulations beyond traditional ways. Two notable examples are i) a successful crystal structure prediction study using data clustering method supported by visual analytics, ii) a time-aggregated 2D heat-map method that reduces the time to explore inner tunnels of proteins. Nevertheless, visual analytics beyond XY plots or ball-and-stick representations is still an emerging field. Several aspects are yet to be identified and discussed between different communities.
Some of the open questions of the state-of-the-art that the workshop addressed are:
-Data Producers: What are the emerging visualization needs for Big Data; how are they different than scaled-up versions of existing tools?
-Data Analysts: How to enhance current analysis tools or create new ones with visualization? What visual analytics techniques, representations and mapping methods can we borrow from other fields now that the molecular simulations can produce a variety of data other than molecular representations?
-Technologists: How can we better use the developing technologies such as Virtual Reality, haptic feedback mechanisms, graphical artificial neural networks, and computer vision to reveal patterns and relationships that were previously not exposed to visualization at all?
Stuart Card, J.D.
Prior to the beginning of the workshop we organizers doubted about the possibility of keeping the interest contributors and attendees from so many different research areas (about half of the participants came from communities outside the CECAM “core business” of atomistic and molecular simulations). As the workshop begun we realised that the same doubts were also held by many invited speakers.
However, since the first talk these doubts vanished and remained a non issue for all the three days. All the contributions were followed with keen interest by attendees and many questions were raised by the audience after every intervention. This may constitute a point for further consideration as it may imply that such an event tapping both from inside and outside our community may be an opportunity for fruitful contamination of ideas. Obviously, these positive aspects could be outbalanced by a constraint on the scope and depth of their talks but feedbacks collected during the workshop did not highlight this issue and many attendees stated that talks were source of (unexpected) inspiration for their research. Another general point worth mentioning is the basic techniques of visualization discussed particularly in the first day that, for many participants (even experienced ones), constituted material for learning according to their feedback. Perhaps a different event (such as school or part of it) blending visualization and machine learning may be worth of consideration in future CECAM events.
The workshop was an opportunity to discuss on the following topics which were perceived as very important by participants:
- Data visualization for complex data sets: the importance of selecting the most appropriate metaphor for conveying information to the reader from different sources and with different means and the importance of correctly selecting things such as glyphs and colour maps. The adaption of these choice to different contexts (e. g. research, dissemination) was also stressed. Direct Volume Rendering and switch to different representations with an increasing/decreasing number of represented objects were cited as “must have” features for modern visualization applications.
- Creation of intelligent workflow systems. Development and testing of scientific software. The large computational infrastructures available in Europe allow to address high performance computing (HPC) -based investigations in material science and soft matter. We strongly need shared resources across Europe (and outside Europe?) and more importantly environments providing a flexible and customisable integration of such resources. Sharing and validation of the of the large amount of data generated are to be really exploited to the fullest extent. The same hold for scientific software development, testing and versioning. Projects such as NOMAD constitute a model for such efforts.
- Immersive Virtual Reality for science. The potential benefits and disadvantages of IVR in scientific applications were debated. On one hand, some of the participants showed IVR environments with different degrees of maturity showing how with IVR it is possible to integrate very large and complex data sets (especially in molecular medicine and genomics) and even collaborate within these environments. On the other hand some of the attendees argued that IVR may be too dispersive and distracting for users and effort should be focused on scalable 2D applications. Alternatives to IVR (such as multiple display walls) were also presented.
- Machine Learning and Molecular Dynamics. The accuracy and flexibility of MD Force Fields (FF) trained using Neural Networks has been shown and discussed as one approach that will have a deep impact on Molecular Simulations. Also the central role of Unsupervised Learning methods such Clustering algorithms and dimensionality reduction methods has been demonstrated to be crucial for any large scale simulation study and to develop effective visualization methods.
One need that was highlighted for this type of event was to increase the time allotted for open and informal discussion as compared to the presentaion of talks. The demontrations of IVR were appreciated but perhaps there was an umbalance between the focus given on Virtual Reality in the demontrations and the talks. Other “hands on” sessions should have considered in the schedule.
From a general point of view, inviting experts to show how to tap into new, emerging technologies has been fruitful and it should be done again in the future even if it is not straightforward how to organize these events without veering too much off the spirit of CECAM workshops. For sure the workshop has highlighted a big gap between what specialists either from the visualization world or from Big Data and Analitycs consider a solid standard that should be used and widespread and what many (young) participants normally use in their studies. This gap may be perhaps overcame by making available more powerful middleware (see point 2 above) but also by organizing more mixed type events such as this one.
Typical channels to organize these future conferences would be CECAM and Psi-K. However, in future events a greater commitment by private firms working in data analytics and/or visualization should be sought, to keep the level of multidisciplinarity of the workshops. We have one sponsorship from hardware vendors but our attempts to other companies failed, not because of interest (e. g. one speaker came from the Unicredit bank research team) but for lack of time: big companies (or public organizations) not already aware of the type of research carried out by the CECAM community may need many bureaucratic steps before agreeing to a sponsorship even if the economic commitment is not very relevant to them (such was precisely the case for Unicredit).
Will these developments bring societal benefits?
Computational approaches are becoming central in every scientific discipline; the open research policies set in H2020  push researchers to adopt data intensive approaches to remain competitive. Increasing the awareness of the CECAM community in this type of events to the new possibilities created by the Big Data trend has therefore the potential of increasing the quality and quantity of research from the community and, as a consequence, bring societal benefits in all the areas were atomistic and molecular simulation may have an impact, such as nanomedicine or new materials. It may also constitute a chance for scientists with a different background to get in touch with what is CECAM and what type of research its associates do. The development and deployment of powerful and engaging visualization and virtual reality technologyhas the potential to reach a wider audience highly effective ICT tools for educational purposes and science outreach in molecular sciences, in agreement with the H2020 guidelines on open education.