At the start of this year I wrote about the importance of spectroscopy in my blog Spectroscopy: Where Theory Meets the Real World. Experimentally, spectroscopy is essential for the characterization of compounds, identification of unknowns, tracking of contaminants, in situ monitoring of chemical processes, and more. Modeling has worked hard to develop reliable ways to predict spectra from first principles.
An interesting question came up during Q&A, and I wanted to highlight it here since I think it's of general interest and serves to underscore some of the less intuitive aspects of DFT (density functional theory). There are several different approximations that one can use to predict the energy of excited states. The simplest is to use the differences in the orbital eigenvalues. The question is why this is such a poor approximation in DFT, especially given that it's a reasonably good approximation in molecular orbital (MO) approaches like Hartree-Fock or AM1.
The answer lies in the difference between an MO-based approach and a density-based approach. The theory of DFT is based on the charge density, not the MOs. While the eigenvalues of Hartree-Fock theory are related to the energy of the orbitals the eigenvalues in DFT are related to, well, something else. Write down the Hartree-Fock energy expression for a molecule and for the cation and then take the difference. You get the formula for the orbital eigenvalue. This is Koopman's theorem: εi= E(neutral) - E(cation) (and you can read more about it here). When you do the same thing for the DFT energy, you discover Janak's theorem, εi=∂E/∂ni. If you approximate this with finite differences, εi=ΔE/Δniit then it might look a bit like Koopman's theorem, but the finite difference turns out to be a poor approximation to the infinitesimal derivative. You can read just how bad the approximation is in my 2008 paper.
Fortunately for us, we now have TD-DFT. As illustrated in Delley's original paper and in my own modest tests, this is a dramatic improvement over what was possible before. It increases the usefulness of this essential modeling tool and provides another way for theory to connect to the real world of observables. Stay tuned for more results in the coming weeks.
I’m excited to be chairing the International Conference on Chemical Structures along with Markus Wagener of MSD. This is the leading conference focusing on the handling of chemical structures in computer systems, specifically research and development in chemical structure processing, storage, retrieval, and use. The conference fosters cooperation among organizations and researchers involved in the merging fields of cheminformatics and bioinformatics and has a reputation of providing in-depth technical presentations with ample opportunities for one-on-one discussions with the presenters.
This year, the conference will be held June 5-9, 2011 at the beautiful Leeuwenhorst Congress Center in Noordwijkerhout, The Netherlands. The call for papers was issued last month, and we’re seeking papers on topics in cheminformatics, structure-activity and structure-property prediction, structure-based drug design and virtual screening, analysis of large chemistry spaces, integrated chemical information, and dealing with biological complexity. Abstracts are due January 31, 2011.
Pipeline Pilot 7.5 Component Update 4 is nearing completion. This update includes the new Pipeline Pilot Chemistry Cartridge, which forms part of the expanded Cheminformatics Collection. Also in Component Update 4, the updated List Management and Query Services system is set to wow with an enhanced form designer, improvements to make it easier to use, and the ability to use protocol-based reporting and analysis tools directly from a form and the new favorites bar.
I’ve been working on the Pipeline Pilot Chemistry Cartridge and its associated Pipeline Pilot Components. The Pipeline Pilot Chemistry Cartridge is a new Oracle Data Cartridge built using the Pipeline Pilot Chemistry Toolkit. In the weeks leading up to the recent Accelrys European User Group meeting, I was busy timing the creation of a 123 million compound database using a beta version of the Pipeline Pilot Chemistry Cartridge. I never quite made it to my goal of 1 billion compounds in time for the User Group meeting, but in the coming weeks I hope to be able to update you on my attempt to index 1 billion compounds using the actual release version and to provide pointers for indexing your own chemical data.
Among the forward-looking technologies previewed at the European User Group Meeting in Barcelona last month was next generation sequencing. Richard Carter of Oxford Nanopore Technologies overviewed the impressive growth of this technology, which by many accounts is not just outpacing Moore's Law, but making Moore's Law look like it's flatlined! With throughput now doubling every five months or so, bioinformaticians and scientists to find better ways to integrate next generation sequencing data into their existing workflows.
In the video below, Richard provides more information on his talk, which described how Oxford Nanopore Technologies is using the Pipeline Pilot Next Generation Sequencing Collection to translate work from many of the prominent next-generation sequencing publications within Pipeline through relatively trivial protocols.
"What Pipeline Pilot gives us is a platform," Richard says. "I as a bioinformatician can write the protocols and roll that out to the bench scientists so that they can start asking their scientific questions of the data instead of asking the bioinformaticians to solve those problems for them. It's all about empowering your bench scientists. At the moment, next generation sequencing technology is an elite tool. If we can bring that to the masses, so much the better, and especially we believe at Oxford Nanopore Technlogies the combination of simple technology that we're introducing with the benefits and simplicity of Pipeline Pilot will really spread the use of next generation sequencing."
Everything old is new again. It’s interesting to see ideas in the high-performance computation community coming back around after many years. In the late 1980’s I used an array processor from FPS to speed up coupled-cluster calculations (very expensive and accurate electronic structure simulations). The array processor was bolted onto a VAX. The most compute-intensive parts of the application were shipped off to the FPS machine where they ran a lot – maybe 50-100x – faster.
A couple months ago I forwarded a tweet from HPC Wire, GPGPU Finds Its Groove in HPC, but a more recent article in Annual Reviews of Computational Chemistry, “Quantum Chemistry on Graphics Processing Units”, is far less optimistic about this new hardware. Authors Götz, Wölfle, and Walker discuss Hartree-Fock and density functional theory and “outline the underlying problems and present the approaches which aim at exploiting the performance of the massively parallel GPU hardware.” So far quantum chemistry results on the GPU have been less than stunning.
China's Tianhe-1A super computer uses 7,168 Nvidia Tesla M2050 GPUs and 14,336 CPUs
The GPGPU approach is similar to the one we used in 1988 in Rob Bartlett’s research group. Routine calculations are done on the main microprocessor, but the most time-consuming computations are shipped off to specialized hardware (array processor then, GPU today). The programmer is required to insert directives into the code that tell the computer which hardware to use for which parts of the calculation. Usually a lot of code rewriting is required to achieve the speedups touted by the hardware suppliers. One of the major differences between then and now is the cost: our FPS cost on the order of $250 000, but a GPU board goes for around $300. And that’s one of the big reasons that people are so excited.
According to the article in HPC Wire, applications such as weather modeling and engineering simulation (ANSYS) have been ported, though ANSYS reported only a modest 2x speedup. The report in Ann. Rev. Comp. Chem., by contrast, points out the many challenges in getting quantum chemical (QC) calculations ported. In the case of the coupled-cluster calculations, we could formulate much of the problem as matrix-matrix and matrix-vector multiplications, which really fly on an array processor. Hartree-Fock or DFT programs – the QC methods most commonly in use by commercial molecular modelers – need to evaluate integrals over a basis set and perform matrix diagonalizations, and these operations are less easy to port to the GPU.
GPUs hold tremendous potential to reduce the cost of high-performance computation, but it’s going to be a while before commercial QC products are widely available. In the meantime, stay hopeful, but at the same time stay realistic. Maybe the approach will work this time around, but it’s still too early to say.