Predictive Analytics for Materials Science
For R&D organizations focused on materials science, information management involves a lot more than crunching the latest sales figures or integrating statistics from inventory and accounting systems. Whether working to develop affordable fuel cells, lighter airplane parts or scratch-resistant paint, materials and CPG companies must routinely deal with extremely complex data spanning multiple scientific disciplines, including chemistry, toxicology, biology, physics and more. And although they invest millions of dollars in research efforts and go to great lengths to capture the results, many are failing to fully capitalize on the value of their organizational knowledge.
The issue is that traditional databases and legacy systems isolated within specific departments or research specialties lack the depth and breadth to keep up with the information management requirements of the modern scientific enterprise. These tools help users keep track of what happened, but not necessarily why it happened. And above all, they can’t make reasonable predictions about what the future holds. “Predictive analytics” such as molecular modeling or response surface methodology offer ways for materials companies to more effectively leverage all the many sources of information available to them, reduce the need for expensive lab experiments and ultimately speed product development efforts. Here are three key ways that research organizations can utilize analytics to extract greater value from their data and engage in more predictive science.
Step 1: Simplify Data Integration
From unstructured text-based laboratory notes and chemical formulation recipes to two-and three- dimensional molecular structures or detailed images, materials science data is notoriously complex. And the volume is enormous – millions of elemental combinations may need to be explored to build a new alloy, polymer or catalyst, for instance; or tens of thousands of formulations screened in order to create a safe and stable skin product. Unless this information can be collected, tracked and analyzed as a cohesive whole, collaboration across scientific disciplines will be difficult and organizations will miss the important connections that lead to breakthrough discoveries.
Data integration in a scientific environment is a particularly daunting challenge, however. This is because of the wide diversity of data formats (including text, numeric and image-based), the number of instruments and variety of equipment that need to be accommodated (everything from microscopes to customized high throughput rigs) and the many different locations within the R&D organization where research intelligence may be hidden (in the chemistry or toxicology lab, with the processing or engineering department, etc.). Typically, research project stakeholders have turned to manual approaches to leverage disparate information sources – spending hours searching through files, reformatting data, and cutting and pasting reports together, or enlisting IT resources to hand code customized “point-to-point” connections to move data between systems and applications. But in a world where product cycle times are shrinking while the volume and complexity of research data is increasing, manual integration is simply too time-consuming and expensive.
Fortunately, service-oriented technologies are changing this by enabling a more unified approach to managing complex scientific information. A Web services-based foundation for scientific informatics can support the plug-and-play integration of multiple sources of enterprise data. This presents a number of compelling advantages for research organizations: individual scientists can quickly access the information they need, regardless of where or how it is saved; project teams can more easily collaborate and share findings across disciplines and departments; and the company as a whole can better track and reuse valuable research. Simpler integration also facilitates sophisticated analysis that includes multiple data sets – a key requirement for predictive science.
Step 2: Experiment Virtually
Is it possible to build a more fuel-efficient airliner by replacing aluminum materials with lighter, yet high performing plastics? What combination of emulsifier and dye will yield printer ink with the right color, viscosity and flow? These and other questions could be explored through trial-and-error experimentation, but this approach is both incredibly costly and time–consuming, especially when it comes to complex and highly variable materials such as polymers and nanomaterials that can be impacted by even small changes in chemistry, composition or processing steps.
Technology that facilitates modeling and simulation of chemical compounds and materials presents a compelling alternative to experimentation alone. Used widely in pharmaceutical research, software-enabled scientific modeling and analytic techniques make it possible for researchers to design and test products in silico (i.e., computationally). Instead of running multiple experiments in a lab, researchers can take advantage of simulations to explore a broad range of ideas virtually before doing any actual chemical synthesis. Analytic models can also leverage existing data sets to quickly and reliably predict the behavior of thousands of potential formulations or mixtures, helping researchers to identify the common characteristics of the optimal recipes. Only the most promising are then subjected to experimental screening, further reducing the number of laboratory experiments required.
The benefits of predictive analytic approaches are twofold. One, resources that would have been used during extensive experimentation are saved. Two, product design experts can investigate far more options than would be possible through lab experiments alone, and are thus more likely to quickly hit upon the most commerically viable materials candidates. The key here is being able to leverage multiple data sources – including previous experimental archives, current research and even publicly available information – in order to build the most accurate and complete predictive models.
Step 3: Automate Predictive Techniques
A final critical condition needs to be met for predictive analytics to add value for materials researchers: The technologies deployed must be simple enough to use so that project stakeholders throughout the organization (not just informatics or modeling experts) can use them. Automation is the critical ingredient. Automated data integration enables researchers to quickly and easily connect previously siloed pieces of useful information without requiring the help of an IT programmer. (For example, a project team might want to combine chemical data with information from sourcing and distribution systems in order to forecast, during early research, which compound ingredients are affordable choices for large-scale production.) Automated workflows that bring together multiple data sets can allow scientists to capture complex modeling and analysis workflows in a way that specialized expertise can be leveraged and reused by the entire organization. A sophisticated statistical algorithm or complicated model applied to a single research problem has limited value. When it is available to be used over and over again, it becomes an organizational asset. Thus, automation delivers the speed, efficiency and “user-friendliness” critical to increasing the reach of predictive science.
Predictive analytics give organizations engaged in materials research a way to design better products, understand and avoid failures, and make rational research decisions – all on the basis of information that they already have within their enterprises. Assisted by a services-based approach to scientific informatics that supports automated data integration and repeatable processes, predictive techniques can save hundreds of thousands of dollars while also propelling products to market faster. That’s not just good science, that’s good business.