As I touched on when I was musing about copy and paste, it never ceases to amaze me how much compute power has increased since I started doing modeling 15 years ago. Back then, I could minimize a few hundred atoms with classical mechanics and watch the geometry update every few seconds after each iteration. Now, I would do a few thousand molecules of the same size in a few seconds.
Of course, as computers get faster, the sorts of calculations that customers want to run also get larger and more “realistic.” The explosive growth of multi-core architectures has lead to an increased focus on parallelization strategies to make the most efficient use of the CPU power. Traditionally, the main focus with high performance codes has been on getting them to run in fine-grained parallel as efficiently as possible, scaling a single job over 100’s of CPU’s. The issue here is that, if you are only studying a medium sized system, you will quickly get to a point where communication between nodes costs far more than the time in the CPU, leading to poor performance.
The solution, if you are doing multiple calculations, is to run multiple calculations in a coarse-grained, or embarrassingly parallel, approach. In this mode, you run an individual calculation on a single CPU but occupy all your CPU’s with multiple calculations. Materials Studio will do this but it can be monotonous, although you can use queuing systems to get somewhere here. The other issue is that when you start doing many calculations, you also get lots of data back to analyze. Now you not only need a way to submit and control all the jobs, but also to automatically get the results you want.
Luckily, the Materials Studio Collection in Pipeline Pilot has both of these functionalities. You can use the internal architecture to submit jobs in both fine and coarse grained parallel (or a mixture of the two if you have a nice cluster!). You can also use the reporting tools to extract the data you want and display it in a report.
A great example of this is working with classical simulations of polymers for calculation of solubility parameters. Here, you need to build several different cells to sample phase space and then run identical calculations on each cell. You can quickly develop the workflow and then use the coarse-grained parallel options to submit a calculation to each CPU. When the calculations are finished, a report of the solubility parameter can be easily generated automatically.
So, whilst there is a definite need for an expert modeling client like Materials Studio, the Materials Studio Collection really does free you up to focus on problem solving (using Materials Studio!), not waiting for calculations to finish and then having to monotonously create reports!
We are now rolling into the last few days of the release of Materials Studio 5.0. It has been a pretty long but very exciting and challenging release as we really started to use sprint releases and agile techniques. From a customer perspective this meant that, from February onwards, early versions of the software have been going to some customers to get feedback.
During the 5.0 release cycle, we have been focusing on several large projects covering the range of technologies from quantum mechanics by classical simulations to mesoscale modelling. There is far too much to blog about in one post so I will break this up into a few posts.
First, let’s turn back the clock to the start of the previous release! During the development of Materials Studio 4.4, we began work on the parallelization of Forcite Plus. There was no intention to release this functionality in 4.4 but we got tantalisingly close! However, we didn’t have time to include core functionality like COMPASS support and more validation was needed. During the 5.0 release we finished off the functionality and we worked closely with HP to test Forcite Plus scaling and found some interesting “features”. One of these was the Ewald summation method did not scale well above 16 CPU’s which at first we couldn’t understand. After some head scratching, the classical simulations guys worked out that we had optimized Ewald so much for scalar calculations, that this had a negative effect on scaling in parallel. We didn’t want to lose the scalar performance so we added a switch so the algorithm changes to get good parallel scaling. One of our beta test customers has now seen pretty linear scaling for Ewald sums up to 128 CPU’s!
Another big project in 5.0 has been the development of a new Amorphous Cell. Amorphous Cell is a core tool for polymer simulations enabling the generation of bulk amorphous structures and has been available in Materials Studio since 1.0. However, it needed a re-vamp as it had some serious limitations including failing to build some pretty basic structures as it was tied to a specific forcefield. In fact, a few years back a customer memorably asked me to build a box of HCl and, unknowingly, I fell into the trap. It failed as the system was not parameterized which they obviously knew!
Before starting the development, we had long discussions about the best approach to creating this essential tool and decided to re-use the packing technology from Sorption. This has some great advantages as we could do really interesting things like pack into isosurfaces. It also gave the most requested enhancement – removing the dependency on the CVFF forcefield so now you can build any amorphous material. Again, early feedback suggested that having weight percent displayed for each component was really useful so this was added in. Our beta testers were equally happy as they could use the new packing task to build materials that were previously impossible and others could pack polymers with side chains at real densities rather than having to build at a low density and compress. Of course, the first structure I built was a box of HCl – successfully this time!
Enough for now, later I’ll blog on some of the fantastic new quantum mechanics functionality.