Just reading a recent Financial Times and I noticed on the front page that the German Chancellor Agela Merkel, and her husband Joachim Sauer, the ex director of the historical Catalyst Consortium from part of Accelrys, were sharing what I guess was a scientific joke. It is interesting that through the initial part of Chancellor Merkel’s election campaign Joachim Sauer kept a very low profile, declining to give any interviews not related to his scientific work, but has recently been seen more in public with his wife. Joachim, a Professor of Quantum Chemistry and a historical collaborator with Accelrys, has a strong technical background in advanced catalyst modeling and its further interesting that in Germany where arguably the modern chemical catalysis industry began, now at the highest levels of government there is a voice for advanced virtual chemical screening and analysis. I hope that this is the sign of an increased acceptance and interest in the virtual chemical and quantum space area in all levels of government.
Photo from the Financial Times of Joachim Sauer and his wife, German Chancellor Agela Merkel
Scott Markel’s article, “Drowning Research Scientists, Meet Life Preserver,” found in the Sep 16, 2009 version of Drug Discovery & Development makes an impressive case for using pipelining technology in bioinformatics research community and in the broader biomarker and translational research communities. As he points out, there will never be a one-size fits all research approach for these scientific communities. The sheer volume of data sources and open source and third party integration opportunities just continue to grow and Pipeline Pilot, a leader in data pipelining, is uniquely capable of handling this challenge.
I loved his conclusion: Rather than relying on standard templates, users should be able to configure what they want to see and how it is presented. This degree of flexibility leaves room for the innovation so vital to these initiatives, while still providing a framework for faster decision-making and ultimately faster results.
Scott is a Vice-President and member of the Board of Directors of the International Society for Computational Biology. Scott is also the head of ACCL’s talented biosciences R&D team and developer/architect extraordinaire. I get paid to work with him. Lucky me.
Informatics in High Content Screening (HCS) is reshaping the mix of scientists driving drug discovery efforts. In the early days of HCS I worked closely with electrical, mechanical and software engineers to develop better systems for image acquisition and processing. My responsibilities as an HCS biologist involved painstaking hours of sample preparation and cell cultures and constant enhancements to my materials and methods section for preparing my biological specimens for imaging. I was motivated by the many new collaborative efforts that began with the software engineers, the systems engineers and the machine vision scientist developing HCS systems. I found myself teaching basic concepts of biology as I learned about illumination and optics, piezoelectric drives for auto focusing and, of course, the strings of zeros and ones that would eventually tell me what happened to my protein. It was exciting for me to be part of a cross functional team developing new applications by piecing together advances in hardware, image processing and biological assay technologies.
High Content Screening systems and vendor software has come along way since my introduction to the technology ten years ago. Vendors struggled between giving end users powerful, flexible systems and ease of use (1). The bottleneck has shifted from application development to data informatics . Software systems in HCS have evolved to integrate databases and other related sources for chemical structures, target characteristics, and assay results. Today, I collaborate with colleagues in HCS in new areas that include data mining, principal component analysis, Bayesian modeling, decision trees, and data management. The mix of HCS conference speakers and attendees has shifted from what had primarily been assay developers to a growing population of informaticians and IT experts. Talks have moved beyond assay design and system development to incorporate more downstream data processing. We have worked on complex fingerprinting methods for predicting characteristics of a compound for such things as predicting mechanism of action or how it might affect a particular biological pathway involved for example, in neuronal stem cell differentiation. Vendors are moving to more open systems for image processing and are integrating more third party applications into their HCS acquisition systems to keep up with the shifting bottlenecks and emerging solutions. Informaticians have been able to improve data analysis efforts and significantly reduce the number of man-hours required for downstream data analysis (2). I've been fortunate in having been able to develop relationships with experts at most of the leading HCS instrument companies. My journey has been one of constant growth and continuous learning. I’m anxious to know what’s coming next in High Content Screening and eager to learn from my ever growing network of scientific experts.
1. High-Content Analysis: Balancing Power and Ease of Use by Jim Kling
2. Data Analysis For A High Content Assay Using Pipeline Pilot™: 8x Reduction in Manhours from a poster by L. Bleicher, Brain Cells Inc
Attendance for the first annual High Content Analysis East conference is looking great. There is quite a bit of focus at this meeting on data integration, analysis and management. And what a great time! There’s been an explosion in informatics related software for High Content Analysis this year. I’m expecting to see some really cool new software features in informatics from several experts in the field including; Jason Swedlow (OME), Michael Sjaastad (MDS), Oliver Leven (GeneData), Neil Carragher (AstraZeneca) and the queen of High Content Analysis herself, Ann Hoffman (Roche).
Our own long time HCA veteran Kurt Scudder will be presenting “Beyond Basic HCS Data Management: Learning, Modeling, and Advanced Data Analysis using Pipeline Pilot”. Stop by the booth (lucky # 7) and say hi. I’ve got some wiz bang applications for a High Content Analysis Integrated QC Drill Down, as well as easy to use single cell and compound modeling apps.
A number of recent (and not so recent) initiatives in Congress are designed to encourage production of ethanol as an alternative fuel, but how much is reallyfeasibleand how much is catering to eco-hype? A search of congressional bills for the111th congressturns up an astonishing1592bills relating to “energy” and150to “renewable energy.” These bills do everything from providing tax credits for growing corn, to funding development of production facilities, to providing tax credits for consumers. But which options really make sense?
The debate over methanol from corn has been going on for a while, and judging from the available information, there’s plenty to be concerned about.PaztecandPimantelhave published numbers that suggest that such production is a net energyloss. This is supported by anEPA reportsummarized inChemical & Engineering News(C&EN) May 11, 2009 [sorry, you’ll need a subscription to read that]. A 2007 energy law set a production target of 36 billion gallons of biofuels by 2022. As reported in C&EN, the law requires a full life-cycle analysis that “reflects a growing concern that ethanol may result in higher CO2emissions due to land-use practices, such as clearing rain forest…” And another recent C&ENarticlediscussed the potholes on the road to commercial biofuels. According to the article, of the six cellulosic ethanol projects to receive DoE grants in 2007, none of the projects has been built, although one is under construction.
Despite the differences between the optimists and pessimists, I think that they agree on one thing: the need for higher efficiency. Given the current efficiencies of biofuel production, internal combustion engines, and fuel cells, biofuels can’t reach the goals that we’ve set for them (e.g., 10% of electricity from renewable sources by 2012, and 25 percent by 2025). What is unquestionably needed is morefundamental research. To underscore some of my favorite, recent high points:
Similar work was presented by Robert Andersson at the 21st North American Catalyst Society meeting earlier this year. This converted biomass to syngas (H2/CO) and subsequently alcohols using heterogeneous catalysis.
These are just a few examples of the many fundamental advances that will be required to make biofuel sustainable and commercially viable.
Scientists regularly cry out for more fundamental research funding at the start of each federal budget cycle. The American Reinvestment and Recovery Act (ARRA) of 2009 provides for $4.6 billion inDOE grantsfor basic R&D. Thelatest congressional omnibus billprovides $151.1 billion in federal R&D, an increase of $6.8 billion or 4.7 percent above the FY 2008 value. This is a real good start. Let’s make sure that we use the money wisely.
Here are the results from the poll attached to this blog post:
In the field of machine learning, a binary classification model is a statistical method for assigning an object to one of two categories (classes): benign vs. malignant, active vs. inactive, crystalline vs. noncrystalline, and so on. We build these models to reduce the number of experiments we need to run or to reduce the human labor required to evaluate experimental data (such as image data). The models are rarely perfect—meaning that they generally assign at least some objects to the wrong category.
The ROC AUC score comes from a ROC plot, which is simply a plot of the true positive rate (sensitivity) against the false positive rate (1 − specificity). We generate the points on the plot by varying the cutoff value we apply to the model's output to distinguish between the predicted classes. (Note that most so-called classification models are at root ranking models, which output a numerical score corresponding to the relative likelihood of the object being in one class versus the other.) Here's a typical ROC plot:
ROC plot for Pipeline Pilot Bayesian model of NCI AIDS data
Each point on the plot tells you this: "For a true positive rate given by the Y axis value, the X axis value is the price you must pay in false positives." It is then up to you to decide what the best tradeoff is and to set the cutoff accordingly. Or you may decide that none of the points on the curve give you the combination of sensitivity and specificity you need, and that you need a better model.
As you might infer from the name, the ROC AUC score is just the area under the ROC curve. It ranges in value from 0.5 for a model that's no better than random guessing to 1.0 for a perfect model. Unlike the other metrics I mentioned above, the ROC score is independent of any specific cutoff value. Because of this, its value is an intrinsic property of the model (for a given test set). It does not depend on any preference we might have for, say, reducing the number of false negatives at the price of more false positives. It gives us a single value that we can use to easily compare the performance of different classification methods or to tune the performance of a given method.
Why do we view life in 3D rather than 2D? I am not thinking fundamental physics here (which may consider 12D or more), rather what advantages do we have in perceiving chemical and biological molecules as more than two dimensional entities? Are there any benefits to viewing synthetic schemes and reaction mechanisms as more than two dimensional, likewise can regarding proteins/DNA/RNA as more than sequences and collections of secondary structures bring us additional insights?
There is an argument to be made that the intersection of biology and chemistry is best viewed in 3D. Just ask your local crystallographer! Life really does happen in 3D. When trying to understand how simple molecules interact with their complex targets, three-dimensional visualization can be indispensible! Nearly 60,000 (and growing) PDB structures can’t be wrong.
There is no consensus on how “best” to view molecular structures or complexes and therefore, a number of solutions are available. Everyone has their personal preference - I find that the DSVisualizer provides all the flexibility and customizability I would like … and it’s free to everyone. That said, I would love to know what your personal favorite visualizer is and why? Vote for your personal favorite over in the right sidebar, and explain why below in the comments.
Just back from the ACS meeting in Washington, and I was reading the Presidential Awards nominees and winners document from the EPA. What struck me in the list of excellent and erudite projects was the degree of chemical innovation that companies from Dow to P&G, BASF and Eli Lilly were pursuing and achieving.
What was interesting to me was that in these tough economic times, it can be hard sometimes to justify and develop innovative chemical solutions, rather than just tweaking the process parameters knob again. These companies had all done that.
Also, as someone who has worked in the materials simulation area for 20 years, and whose role can be summed up in the word ideation, I was stunned by the scope and creativity of these solutions. What particularly interested me was that many of these companies use virtual chemistry as part of their innovation process, since the relative comparison, mechanistic understanding and what if questions can only be accurately and systematically probed using this approach.
The fact that:
improvements in scale inhibitors for power stations leading to lower energy usage,
benign corrosion inhibitors,
improved fuel cell Polymer membranes for more efficient energy storage, and
improved chemical synthesis and coating materials
have all been studied and innovations achieved using modeling approaches shows the direction which we could take to innovate our way out of energy dependency issues.
I look forward to seeing the 2010 nominees and hope that chemical simulation and informatics linked projects are key in that innovation path.