Today at the Materials Research Society meeting in Boston, Cyrus Wadia, the Assistant Director for Clean Energy and Materials Research and Development at the White House Office of Science and Technology Policy challenged the audience to “think as a network” of collaborators, not individual researchers, and to “learn from colleagues” how to better handle large data sets.
The recently publicized presidential initiative for a Materials Genome aims to insert or provide new materials for the production and technology advantage of US industry. This initiative mirrors exactly the focus Accelrys has on integrated multi-scale modeling and laboratory-to-plant optimization. The heart of these initiatives is improving the estimation and management of innovation and design of materials from the nano-scale right up to the process and macro production scale.
As many Accelrys users know, multi-scale modeling where the results of atomistic and micro-scale simulation can be combined with larger scale processes, high throughput screening and testing, and laboratory information yields valuable insights into materials design. The combination of computational and real analytical instruments alongside the organization of laboratory data allows the use of models at all length scales and can be considered as driving the innovation cycle of many companies.
The MGI initiative clearly recognizes this function and the idea of developing a framework and ontologies where information, models and data can be exchanged at different levels and different time scales. The value of making information, more widely available to the engineer and materials scientist for collaboration will help drive more accurate and systematic materials development and usage. Accelrys is very excited about this initiative and is sure its technology will provide clear aid in the acceleration of the innovation lifecycle of new materials—which is a clearly stated goal of this initiative.
I was just creating a slide the other day for the new Materials Studio Collection. As I wrote about being able to automatically generate meaningful reports directly from with Pipeline Pilot, I added a bullet point about “not having to copy and paste” anymore. I re-read this line a few times as something about it was ringing some bells.
I remembered that, when we launched Materials Studio 10 years ago, one of the big advantages was that you could copy and paste charts, structures and values directly into a report. Our previous software packages had always been on Silicon Graphics IRIX machines which had no interaction with Windows machines. Even getting a simple screenshot of a molecule into your favorite Windows word processing package required you to capture the screen on the SGI, transfer the image over to your Windows machine using FTP, and finally inserting it into your report.
It seems funny that we used to go to all these lengths and spend so much time doing something that can now be totally automated using the Materials Studio Collection. Want to change the look of all the tables in your report? Modify the stylesheet. Need to plot the cell volume vs pressure to generate phase diagrams for multiple structures? Add a property reader and plot the charts automatically.
As a Product Manager, it is also interesting to think about how user requirements have changed. Back in the days of SGI, creating a report didn’t take much time compared with the actual computational time so it was easy to justify the transferring of documents etc. As compute times decreased, and the numbers of calculations being run increase, copy and paste made it simpler to create reports. Now that compute times are fast and people are running high-throughput in-silico calculations with thousands of structures or multiple calculations, reports need to be generated automatically, otherwise you would spend far longer creating the reports than running calculations.
Has the requirement really changed? I don’t think so. As a scientist, what I really want to do is spend as little time as possible generating a report so I can focus on the interesting parts – understanding and improving the system.
Want to know more about the Materials Studio Collection? I will be giving a webinar on Wednesday 23rd June outlining the new features and benefits.
3D Pareto surface shows the tradeoffs among target properties: dipole moment, chemical hardness, electron affinity. The optimal leads are colored red, poor leads blue.
How do you search through 106materials to find just the one you want? In my very first blog post "High-Throughput- What's a Researcher to Do?"I discussed some ideas. The recentACShad a session devoted to doing just that for materials related to alternative energy, as I wrotehereandhere.
My own contribution was work done with Dr. Ken Tasaki (of Mitsubishi Chemicals) and Dr. Mat Halls on high-throughput approaches for lithium ion battery electrolytes. This presentation is available now onSlideshare(a really terrific tool for sharing professional presentations).
We used high-throughput computation and semi-empirical quantum mechanical methods to screen a family of compounds for use in lithium ion batteries. I won't repeat the whole story here; you can read the slides foryourselves, but here are a couple take-away points:
Automation makes a big difference. Obviously automation tools make it a lot easier to run a few 1000 calculations. But the real payoff comes when you do the analysis. When you can screen this many materials, you can start to perform interesting statistical analyses and observe trends. The 3D Pareto surface in the accompanying image shows that you can't optimize all the properties simultaneously - you need to make tradeoffs. Charts like this one help you to understand the tradeoffs and make recommendations.
Don't work any harder than you need to. I'm a QM guy and I like to do calculations as accurately as possible. That isn't always possible when you want to study 1000s of molecules. Simply looking through the literature let us know that we can get away with semi-empirical.
Enjoy theSlideshare, watch for more applications of automation and high-throughput computation, and let me know aboutyourapplications, too.
“Chemistry for a Sustainable World” was the theme of the ACS spring meeting. My own interests, of course, are in modeling, and more recently in high-throughput computation. The Monday morning session of CINF was devoted to the application of this to electronic and optical materials. The presentations included both methodological approaches to materials discover and applications to specific materials, with a focus on materials for alternative energy.
to help look for the best molecules possible for: organic photovoltaics to provide inexpensive solar cells, polymers for the membranes used in fuel cells for electricity generation, and how best to assemble the molecules to make those devices.
Uniquely, the project uses the IBM World Community Grid to sort through the myriad materials and perform these long, tedious calculations. You may remember this approach from the SETI (at) home, which was among the first to try this. This gives everyonethe chance to contribute to these research projects: you download software from their site, and it runs on your home computer like a screensaver: when your machine is idle, it’s contributing to the project. Prof. Aspuru-Guzik said that some folks are so enthusiastic that they actually purchase computers just to contribute resources to the project. It’s great to see this sort of commitment from folks who can probably never be individually recognized for their efforts.
I don’t want to make this blog too long, so great talks by Prof. Krishna Rajan (Iowa State), Prof. Geoffrey Hutchison (U of Pittsburgh), and Dr. Berend Rinderspacher (Army Research Labs) will be covered in the next blog.
I was also quite happy to see that the some of the themes in my presentation were echoed by the others – so I’m not out in left field after all! I’ll blog about my own talk later on, but here’s a quick summary: Like Prof. Aspuru-Guzik’s work we used high-throughput computation to explore new materials, but we were searching for improved Li-ion battery electrolytes. We developed a combinatorial library of leads, set up automated computations using Pipeline Pilot and semiempirical VAMP calculations, and examined the results for the best leads. Stay tuned for a detailed explanation and a link to the slides.
And keep an eye out, too, for the 2nd part of Michael Doyle's blog on Sustainabilty.
Join us at the ACS Spring 2010 Conference, held at Moscone Center in San Francisco on March 21-25, 2010. At booth #1008, Accelrys will be showcasing the latest developments in Pipeline Pilot, Discovery Studio and Materials Studio.
We have a variety of talks, workshops and posters planned for the conference, including:
In the 21st century, materials and energy are more topical than ever before. Insights at the atomistic and quantum level help us to design cleaner energy sources, and find less wasteful ways of using energy.Join uson March 16th as Dr. George Fitzgerald presents"High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Battery Electrolyte Materials."
How modeling can support the discovery of components to enhance the performance of lithium ion battery formulations
How to use Materials Studio components in Pipeline Pilot to analyze and screen a materials structure library for Li-Ion battery additives
Results from a collaboration with Mitsubishi Chemical Inc which was also published in The Journal of Power Sources
This presentation is part of our ongoingwebinar seriesthat showcases how Accelrys products and services are transforming materials research. You can download related archived presentations in this series or register for future webinars.
We look forward to sharing our insights with you throughout this webinar series.
There were several invited talks on semiconductors and catalyst nano particles, apart from my talk on alternate energy. Many of the speakers discussed the suitability of a particular simulation approach for the study of specific applications, while others discussed the most recent state-of-the-art theoretical advances to tackle real problems at several timescales. It is particularly challenging when simulations are to be used not just for gaining insights into a system but to be a predictive tool as well as for virtual screening. While virtual screening is a well-studied art in the world of small molecule drug discovery, this is only now gaining traction in the materials world.
Chatting with Christopher Lipinski at Drug Discovery & Development Week
Some ten years ago, I first “met” the Lipinski rules in a software project. That was my last direct “hands-on” encounter with chemistry. At Accelrys I am the senior product manager for the Biosciences and Analytics Collections for Pipeline Pilot. Think genomics, proteomics, sequencing, and ontologies and not chemistry! This week I was at the DDDW show in Boston – don’t think “booth babe”.
The conference was not as busy this year as it had been in the past and it was the afternoon of the last day. A distinguished gentleman walked up to our booth wearing a name tag of “Christopher A. Lipinski,” happy to see a fellow booth dweller. Half in jest I asked if he might be the man with 5 rules. Turns out he was and, boy, I was in for an intellectual treat. That Lipinski filter came to life in a new way over the next hour or so. I was spell bound by Dr. Lipinski’s breadth of knowledge, passion for science, and his out of the box thinking. What I didn’t anticipate were his insights into the importance of chemistry for the biomarker and translational research space.
He was saying some really awesome things so I started writing them down. It was hard to focus on note taking because Dr. Lipinski is an excellent speaker and very animated. Below are a few items that I am willing to share in no particular order:
Translational research must have good chemistry married to good biology.
Your company (Accelrys) combines chemistry and biology in one software application. If biologists are using your software to look at high throughput screening (assay) data that has associated chemical structures, they could better filter out results for poor compounds.
When faced with people problems (like chemistry—biology conflicts) versus technical problems—the people problems are always much more difficult to solve.
The people side is the most important.
NIH is making good strides in the dialog between chemists and biologists.
As soon as the biologist has an assay for a small molecule they should probe/stress test the assay with compounds known historically to cause assay problems.
In software for the (bench) biologist – it needs to be dead easy. Too many peer-reviewed publications have great biology but rotten chemistry.
Biologically active compounds are tightly clustered in chemical space. It is always best to look for new activity in areas of chemical space where you previously found activity.
It takes 10 years to “mature” a medicinal chemist. He then becomes an expert in pattern recognition even if he can’t articulate why certain structures look better than others
Many previously proprietary databases are now in the public domain (See PMID: 17897036). These provide a great starting point for the discovery of drugs for rare diseases.
Dr Lipinski’s long and prestigious career in medicinal chemistry, assay development, computational chemistry, and now in consulting, lecturing, and as an expert witness does not look anything like retirement. That is good news for me.
Dr. Lipinski is shown here with his rapt audience.
Note: Lipinski’s total number of rules actually equals 4. His rules are known as the “Rule of Five” because each of them incorporates the number 5 in some way. For all you literalists out there, “5 Rules” should be interpreted in this way.
We announced the European User Group meeting a few days ago. Check out the UGM webpages, and especially the themes. I am excited that we’ll have users from all product and application areas together, including:
• Materials Studio
• Discovery Studio and Platform
• Training sessions and the lot.
We’re in different tracks, but I expect to see some interesting overlaps/crossovers. For example, we’ll discuss the high throughput methods in materials to the platform, along with what we can learn from the collaborative environments and custom solutions in the Discovery Studio field for other areas such as Materials.
In statistical analysis, we usually try to avoid bias. But in high-throughput screening (HTS), bias may be a good thing. In fact, it may be the reason that HTS works at all.
In his In the Pipeline blog, Derek Lowe discusses a new paper from Shoichet's group at UCSF, entitled "Quantifying biogenic bias in screening libraries." The question is this: Given that the number of possible organic compounds of reasonable size approaches the number of atoms in the universe (give or take a few orders of magnitude), and that an HTS run screens "only" a million or so compounds at a time, why does HTS ever yield any leads? The short answer, as the authors show, is that HTS libraries have a strong biogenic bias. In other words, the compounds in these libraries are much more similar to metabolites and natural products than are compounds randomly selected from chemical space.
The authors used Pipeline Pilot for much of their analysis, including ECFP_4 molecular fingerprints for the similarity calculations. See the paper and Derek Lowe's blog entry for more.
Simpler than the modeling approaches I mentioned in my earlier posting, these require only a statistical analysis of the data (some experimental results, some modeling output). The results reduce N-dimensional datasets to 2 or 3 dimensions that are “grasp-able” by mere humans. Applying these approaches to the apatite data clearly shows how the choices of cation and anion influence the stability of the crystal.
Just think how many other research problems we could understand if we had the tools to look at the data in the right way.
High-throughput experimentation has been a mainstay in pharmaceutical discovery since the mid-1990’s. In a 1999 C&E News article (C&EN, vol. 77, pp 33-48 March 8, 1999) this approach was hailed as the next great thing. Unfortunately, we chemists soon realized that quantity is no replacement for quality; a notable article in the WSJ “Drug Industry’s Big Push into Technology Falls Short,” was critical of this approach.
At the time, I was working on a DOE-funded project (DE-FC26-02NT41218) for high-throughput catalyst discovery for NOx catalysis in lean diesel engines, together with GM and Engelhard (now BASF). In practice, our method was not to generate 1000’s of samples and hope for the best but to screen fewer carefully selected samples quickly, and subject the “winners” to more sophisticated testing.
The approach employed in our NOx project was based on analysis of experimental data, design of experiment, and fitting response surfaces – and it worked. As pointed out in a recentBIOIT Worldhttp://www.bio-itworld.com/2009/02/06/hts-retools.html/article, however, experimental data alone are usually too noisy to build reliable statistical models. What’s a researcher to do?Molecular modeling, of course – hey I’m a modeller: you knew I was going to suggest that.
The key for success, it seems, is to employ a plurality of methods, both experimental and computational. Given even a modest amount of experimental data, you’ll need a database with decent search & query tools and basic statistical approaches like principle component analysis. But atomistic modeling is also important. Work by a number of research groups has shown that you can generate good predictive models fromquantum mechanicalmethods (QM) for lots of different kinds of materials. (Keep in mind that these examples barely scratch the surface of the available literature).
You can see in the examples above that the approach can actually work. But how do you figure out what QM calculations to perform, and how do you create good statistical models? Well, that’s a story for next month.