It’s been a long time since I was actively involved in scientific research – that was during my Ph.D., post-doc and early career. But I can still remember one of the most challenging tasks was getting to the library at least once a week to go through the new journals, find the relevant and interesting articles, and make copies to read “soon”. And that was after getting through the large backlog of journals which had already resulted in a large pile of papers that silently mocked me as they sat there, day after day, collecting dust on a corner of my desk. And then there’s all the articles I had already read and filed, with their siren calls of “come back to me, there’s important information in here that would explain that new result…”. Electronic abstracting services were just becoming available (yes, I know I’m dating myself here), which streamlined the process a bit, but also meant access to more journals than our library subscribed to, only adding to the work! Consider then this chart showing the increase in PubMed journal articles year over year (almost a million articles published last year!). PubMed serves primarily life sciences, but you’ll see a similar pattern with other scientific and technology document sources.
So how can anyone possibly keep up to date with relevant published literature in the face of the growing mountain of articles? Specifically, with a finite amount of reading time, how can researchers find the most relevant papers to read? Well, that problem has been solved by Ian Stott from Unilever who has developed an application that “knows” what journal articles are relevant to you, and delivers them right to your email inbox! Not only that, but the app continually learns and adapts to your interests, based on your preferences. This app, which was recently presented by Ian in a webinar entitled “Informatics-Led Literature Service: Keeping up with the Data Deluge” (find the recording here), was built using the Accelrys Enterprise Platform and Pipeline Pilot, and in particular, features components from the Documents and Text Collection. Using out-of-the-box components and other standard tools, Ian built the app and has deployed it to his colleagues at Unilever. We had a record-breaking attendance at the webinar, with many questions, proving the point that this is a major problem facing researchers.
Several attendees asked whether the same learning technique could be applied to patents, and other documents. It certainly can! And since the app is built using Pipeline Pilot and the underlying platform, it is easy to apply it to other document types, or even have it process multiple types simultaneously. The Documents and Text Collection provides out-of-the-box connectors for PubMed and both US and WO patents. Connectors to commercial document databases can also be created.
One concern that some attendees raised was that if you rely on a learning approach such as this, as opposed to a more manual search, what might you be missing out on? Well, although Ian didn’t present this in the webinar, the statistical modeling techniques that he used allows those kinds of analyses to be performed, and the results will vary by document type and the extent of training that individual users undertake. Of course, any researcher can, and probably should, use a combination of methods to find key research articles.
So what do you think about using modeling techniques such as this to try and manage the “data deluge”? Do you have alternative strategies? What about actually extracting information out of the papers? Can that be automated as well, rather than having to read the papers? That’s a whole other discussion!
In any event, maybe with tools as good as this one that Ian developed, I might be tempted to try my hand at research again…do they still use mouth pipettes?
Recently, #we released Collection Update 1 (CU1) for Pipeline Pilot 8.5, and along with many other updates to various component collections, we have updated the Professional (Pro) Client and released a new feature called “Protocol Comparison”. In software development it is very common to use a “diff” tool to compare two versions of code, to see where differences occur and determine if the differences should be kept or resolved. This can be very useful when working on a newer version of a program, or, when multiple developers have worked on the same program, combining it into a single working version.
Well, Protocol Comparison is like “diffing” for Pipeline Pilot protocols. For two versions of a protocol, it allows you to see which components have been added, deleted, or modified. For modified components, there is a report that shows all the changes to that component, and the same information is provided at the protocol level itself. It also shows which pipes have been added or deleted, and we have incorporated a tool for comparing scripts and long parameter values. Typical scenarios where we think this will be valuable is to review recent changes to your protocol, to see which ones to keep and which to revert, and where multiple authors are working on the same protocol. You can also compare the same protocol deployed to different servers, to simplify the process of making changes to a protocol on a development server, and deploying that protocol to a production server with confidence that only the desired changes have been made.
Protocol Comparison is a feature that has been requested for some time, and although conceptually quite simple, proved to be a fairly challenging task to implement. We’re not aware of any other diff that makes comparisons in the graphical way that Protocol Comparison works, so we’re excited about it and we think this is a big advantage to anyone who uses Pipeline Pilot to create data processing pipelines. I hope you will use it, and provide us feedback about your experiences with it, and any further enhancements you’d like to see.
Today at the Materials Research Society meeting in Boston, Cyrus Wadia, the Assistant Director for Clean Energy and Materials Research and Development at the White House Office of Science and Technology Policy challenged the audience to “think as a network” of collaborators, not individual researchers, and to “learn from colleagues” how to better handle large data sets.
The recently publicized presidential initiative for a Materials Genome aims to insert or provide new materials for the production and technology advantage of US industry. This initiative mirrors exactly the focus Accelrys has on integrated multi-scale modeling and laboratory-to-plant optimization. The heart of these initiatives is improving the estimation and management of innovation and design of materials from the nano-scale right up to the process and macro production scale.
As many Accelrys users know, multi-scale modeling where the results of atomistic and micro-scale simulation can be combined with larger scale processes, high throughput screening and testing, and laboratory information yields valuable insights into materials design. The combination of computational and real analytical instruments alongside the organization of laboratory data allows the use of models at all length scales and can be considered as driving the innovation cycle of many companies.
The MGI initiative clearly recognizes this function and the idea of developing a framework and ontologies where information, models and data can be exchanged at different levels and different time scales. The value of making information, more widely available to the engineer and materials scientist for collaboration will help drive more accurate and systematic materials development and usage. Accelrys is very excited about this initiative and is sure its technology will provide clear aid in the acceleration of the innovation lifecycle of new materials—which is a clearly stated goal of this initiative.
Due to the response to my recent postabout how the Hit Explorer Operating System (HEOS)collaborative program is assisting in the treatment of neglected diseases, I've invited Frederic Bost, director of information services at SCYNEXIS, to talk a little bit more about HEOS and the project. It is with great pleasure that I welcome Fred to our blog!
Thank you Frank, it's great to have this opportunity to talk to your readers. We couldn't think of a better case for the HEOS® cloud-based collaborative platform than what we've seen with the committed scientific community engaged in Drugs for Neglected Diseases initiative (DNDi). The project is grand in scope and comprises scientists spread over five continents representing different cultures, disciplines, processes and companies. In this way, it's a macrocosmic example of what happens in industrial pharma research.
Collaboration requires all team members -- from different physical locations, disciplines and cultures -- to interact equally and as needed regardless of their physical location, disciplinary background or expertise. We've set out to develop a platform that invites all scientists involved in a project to contribute any information that might be beneficial to the team, especially if these scientists don't have the opportunity to interact frequently face-to-face. HEOS ensures that scientists can share whatever they deem relevant; be it a data point, comment on another's work, an annotation, a document, a link from the web or a Pipeline Pilot protocol. The science or the data should never be compromised by external factors. For that reason, we embrace the motto of the DNDi -- and extend it: The Best Science (and the best supporting software) for the Most Neglected.
What does true collaboration look like? Here's an example from the DNDi project: The non-profit organization started a research program against an endemic disease by collecting small compounds sets from volunteer large pharmaceutical and biotech companies. Assays were run by an expert screening company in Europe. While several of the programs proved to be dead ends, one showed promise. The non-profit organization hired an integrated drug discovery contract research organization (CRO) to produce additional analogs using high-throughput screening. Using HEOS, the biotech that provided the initial compounds was able to continue to manage the project while the CRO for high-throughput screening confirmed the most promising hits and leads. The managing biotech was also able to track in vivo studies performed by a US university.
As the program moved along, several ADME, safety and pharmacokinetic teams got involved in the project. Several peer organizations were also consulted on certain decisions. All these efforts successfully delivered a compound ready for the clinic that is today showing great promise in treating a disease for which a new treatment hasn't been produced in decades.
Managing this type of program, whether in a non-profit setting or an industrial one, demands flexible, rich features that can accommodate the needs of each partner at each stage of research while capturing data, keeping it secure and consolidating it so that it is available in real-time to authorized team members when they need it. Data must also be curated, validated and harmonized according to the rules that the project team has established and provided in a common language that enables scientists to compare results, whatever their origin. And because of the power of embedded Accelrys tools, HEOS can also provide the scientific analysis tools necessary to support the team in its decision process. All of these capabilities enable scientists to compare results and make decisions as a team.
It's been fascinating and rewarding to serve this community of passionate scientists fighting against endemic diseases. Together they have participated in an evolution, creating an agile networking environment that combines competencies and science from many places to achieve a common goal. HEOS has quite simply helped the DNDi's virtual teams function as if the world were much smaller than it really is.
My recent article in Bio-IT World discusses the need for a common computational platform in enterprise NGS deployments. The article touts the benefits of a platform that enables rapid integration of varied tools and data…a platform that lets bioinformaticians tailor NGS analyses to the needs of specific groups, that facilitates the sharing of computational best practices and accommodates rapidly evolving standards and hardware. In three words, a platform that is versatile, agile and scalable.
A deeper dive into how NGS data management and analysis are typically handled today makes a strong case for a common platform like this. Most life sciences organizations assign bioinformatics experts to particular therapeutic groups that want to use NGS. All too often, these experts write their own Perl or Python scripts to manage NGS data computation. The glaring problem is: It’s hard enough to rewrite scripts you wrote last week but for another purpose, let alone expect someone else to understand and re-deploy scripts you wrote six months ago.
A case in point—One of our large Pharma customers has built up over several years a substantial library of Perl scripts for managing and massaging NGS data. So much is invested in these scripts that they have people dedicated to supporting the use of these scripts in other parts of the organization. The same scripts might have utility first in oncology, then later in neurodegenerative disease or infectious disease research. And they get the inevitable questions: what is the optimal parameterization of these scripts, say for short read data with lots of repeats? Or for data that may have large numbers of rearrangements? How do I know the scripts are appropriate for my research? And how do I interpret their results with results I get using other methods? The bottom line is: The company is expending an inordinate amount of time, money and resources supporting the use of Perl scripts across the enterprise.
A better approach, and one our customer is implementing, is twofold: first they are wrapping these scripts individually as separate Pipeline Pilot components and providing help documentation at the component level so that other informaticians can use them more efficiently; second, they are creating “best practice” protocols using both the componentized scripts and components from the NGS Collection, together with customized protocol documentation, so that researchers in different groups can use these protocols more easily in a variety of computational contexts. Instead of dithering with raw Perl scripts that often raise more questions than answers, researchers have the benefit of “plug-and-play” components, like Lego blocks, that harmonize and accelerate NGS analysis.
A plethora of Perl/Python scripts and desktop software programs is problematic in today’s dynamic and data-rich NGS environment. With so many ways to interact with the data, it’s next to impossible to efficiently leverage scripts developed by other people for use in other contexts without some sort of shared computational framework. With Pipeline Pilot, on the other hand, researchers can publish clearly documented protocols through a Web interface and be assured that everybody else doing that kind of analysis is doing it in the same way. This common underlying computational model provides organizational scalability for the work of individual experts. Once everybody sees what the model is, even if they continue to use scripts (which many will), they’re at least aligned with a well understood NGS platform that can be deployed and shared by others across the organization.
What’s your greatest challenge and opportunity in managing NGS data and computational pipelines today? What are you looking forward to dealing with tomorrow?
In my last blog, I talked about how improved global collaboration in the Cloud is not only improving Neglected Diseases research but also the "exuberance quotient" of science. Today our current economic woes tied to the sovereign debt crisis have got me thinking about the darker cloud hovering over researchers today, one that may very well threaten "exuberant" science in the months ahead, especially in university labs.
There’s no way you can look at today’s economic situation and postulate that government funding of scientific research in academic labs is going anywhere but down. It stands to reason that this will drive changes in behavior and requirements for Academia to find funding alternatives for their research.
First, university labs will need new ways to collaborate externally, not only with colleagues at other institutions but with those at the many commercial companies that will likely end up funding more and more academic research as government sources dry up. Second, they will need viable channels for commercializing the technology they develop, so that new applications, protocols and processes emerging from university labs become readily available to the wider scientific community (while also providing a return revenue stream supporting university research). Last but not least, with university researchers under increasing pressure to publish results, secure patents and acquire grants in the face of shrinking budgets and resources, they need simplified access to affordable software and services -- and we just took steps towards that end with our recently announced academic program.
This new academic paradigm and resulting wish list become much more achievable when university researchers deploy their technology on a scientific informatics platform that’s already widely used in the commercial world. This provides a built-in installed base and ready market for workflows and protocols. A widely deployed platform with the ability to capture a protocol as a set of XML definitions enables scientists working in the same environment to replicate an experiment or calculation with drag-and-drop simplicity and precision. If you start with the same data set, you end with the same results. Experiments are more reproducible, academic papers more credible and, most importantly, non-experts can advance their research using robust, expert workflows.
Academic researchers drive innovation that impacts the larger scientific community, but getting the innovation out there is still a challenge. In this regard, an industry-standard platform can also serve as the basis for an innovative new marketplace, a kind of scientific application exchange, where academics and their partners can expose their breakthrough technologies to a wider audience—and even charge a fee for using them. In the present economy, this new channel could provide much needed additional funding and a feedback loop for academic groups, enabling them to continue their vital research.
What are your thoughts on surviving—and perhaps even thriving—in today’s down economy?
Neglected Diseases like malaria, Chagas, schistosomiasis and human African trypanosomiasis (sleeping sickness) affect millions of people in the developing world. Drugs currently used to treat these diseases are of limited availability and efficacy. They’re also costly, often based on old molecules and some have severe toxic effects. Even more worrying, drug resistance is emerging in several infectious diseases.The bottom line is: A coordinated, global campaign investigating therapeutics for Neglected Diseases is a critical imperative.
When SCYNEXIS approached us to donate software licenses and be a part of the cure, the Accelrys executive team readily agreed, and I’m so happy and quite proud to be with an organization that’s a part of this worthy effort.
A collaboration involving Accelrys, SCYNEXIS and Tibco Software is now providing a way for scientists around the world to work together on Neglected Diseases. It’s already helping to change the way science is done today and also creating the possibility for new economic opportunities for under-resourced labs in developing countries.
The scientific collaboration consists of SCYNEXIS’ SaaS-based platform for drug discovery—the Hit Explorer Operating System (HEOS®) —which is providing hosted data for several not-for-profit, public-private partnerships (PPPs) that are leading the charge against Neglected Diseases. Accelrys’ contribution includes: Pipeline Pilot for moving data around, running calculations and assembling reports, while our chemical registration software builds the chemical registry. Completing the system, Tibco Spotfire Analytics provides visual analysis tools enabling scientists to interact with their data in real-time. The collaborations are truly global in nature and HEOS® allows real-time sharing of data.
Our Neglected Diseases collaboration has resulted in at least two “Eureka!” moments for me. First, I’m intrigued by the geographical distribution of the scientists using the system. Thanks to the hosted HEOS® platform, Principal Investigators (PIs) in Brazil, Ivory Coast, the Philippines, South Africa, Zimbabwe and many other countries have come together in a vibrant virtual research community. Like other social and professional networks today, this virtual community is empowering isolated researchers as never before, making them part of a larger team, much like big pharma. The hosted system is also increasing the importance of these researchers’ work by making it widely available to their colleagues around the world. A participating researcher recently told me: “My molecules matter, now that they’re part of the larger collection. So what if I only contribute a few… one of them could be a winner someday, which means my work is important now.”
The other thing I find interesting is the remarkably diverse chemistry that is emerging from the project. With so many disparate molecules from so many different places now available for testing against screens, it’s easier for scientists to “jump the chasm” when assessing activity because they’re not locked into only a couple of series. The number of coumpounds, data points and disease targets are growing every year (see figure).
One of the major benefits of a global project like this is really untainted perspective, providing the ability to move beyond fixed ideas and preconceptions to fresh insights. The far-flung researchers now contributing to the HEOS® database bring an unabashed passion to their search for answers. Let’s face it; the diseases they’re researching are endemic to their locality, often touching neighbors, friends and family. When motivated scientists are empowered to make a difference, everything becomes possible — everything from important scientific breakthroughs resulting from better sharing of data to improved viability for the labs providing the data. For example, hosted environments like HEOS® can simplify the process of registering molecules in industry-standard sourcing databases like the Available Chemicals Directory. Under-resourced labs in economically challenged regions can become more sustainable by selling the molecules they discover.
The Neglected Diseases project demonstrates that scientific data can be stored securely and shared globally on a thin client. However, the real takeaway message is more compelling than this technical accomplishment. The real value is: Improved global collaboration in the cloud is empowering researchers in developing regions by making their work available to—and important to—the wider research community. This is not only changing the way we do science; it’s increasing the exuberance quotient of science for many of us.
What’s your vision for scientific collaboration in the cloud?
Next Generation Sequencing produces huge quantities of data,currently up to 60 million sequences per file. Algorithms used to analyse these data load all the information from one file into computer memory in order to process it. With the growth in data volumes these algorithms are beginning to slow down. This is a problem noted for algorithms which detect new forms of RNA and quantify them in RNA sequencing experiments.
In his talk at the 'High Throughtput Sequencing Special Interest Group' (HitSIG) Adam Roberts from Berkeley, CA discussed his new online algorithm 'EXPRESS', designed to interpret RNA sequencing data. (Roberts and Pachter, 2011 in press, Bioinformatics).
Online algorithms process data arriving in real-time. The models generated are updated a sequence at a time. Therefore, the amount of memory required stays constant whatever the volume of data processed and there is no need to save the data if it will not be analysed again later.
Online algorithms would fit very naturally in Pipeline Pilot data piplines. They would also fit well with the new real-time sequencing technologies such as the Oxford Nanopore GridION system. The GridION system already uses Pipeline Pilot to control it's 'Run until ..sufficient' workflows.
Bringing all three technologies together would allow data interpretation to be generated directly from the sequencing machine and the flood of data could be directed straight into the most useful channels.
Fig 1 an RNA sequencing experiment showing a known and a newly discovered form of RNA and the depth of the sequences used to identify them, along one region of the mouse genome.
For those that are attending or thinking of attending, here’s a quick preview of what I plan to cover and why:
A Lap Around Pipeline Pilot 8.5 for IT Professionals
There’s lots of great stuff that we’ve been working on within the next release of Pipeline Pilot, but there’s often many capabilities that IT professionals are looking for that have been in the product for many releases. During this session, I’ll go over various tools that are within the product that every IT person deploying or operating Pipeline Pilot should know.
For those of you thinking, “I’ve been doing Pipeline Pilot for many moons. What can I learn from this session?” There are some new things within Pipeline Pilot 8.5 that will be discussed. Regardless of your experience with Pipeline Pilot, I’m sure you’ll walk away with learning something new that you’ll put into practice.
Pipeline Pilot Application Lifecycle Management
Scientists that develop tools and solutions within Pipeline Pilot are quite familiar with how easy it is to get started building something valuable. But there’s a big step from building something useful for themselves or a small group of users to taking that to an enterprise-wide-deployment.
So what are the things that a creator of a Pipeline Pilot based innovation need to add to their project to make it an enterprise success? What does an IT professional need to do to provide an enterprise infrastructure to deliver those scientific innovations? Answering those questions and more are the target focus for this session. Do you have more questions that you’d like answered? Ask us within the Accelrys IT-Dev Community!
Pipeline Pilot Architecture Deep Dive
While this isn’t my session, I would like to advertise it a bit. Jason Benedict is our Senior Architect for Pipeline Pilot and he’ll be delivering this session. He did this same session at the ATS in April, and it was mind blowing. I get the luxury of sitting a few doors down from him and I get nuggets of great information about how Pipeline Pilot works deep within the product and why we did things a certain way. Many times this lands on a whiteboard, but often doesn’t get out into the wild. Well now is your chance to get this information first hand! He covers topics such as how Pipeline Pilot takes your request to perform a job, and what the system does to honor that job. This spans from authentication and security to memory management and data processing.
Pipeline Pilot 7.5 Component Update 4 is nearing completion. This update includes the new Pipeline Pilot Chemistry Cartridge, which forms part of the expanded Cheminformatics Collection. Also in Component Update 4, the updated List Management and Query Services system is set to wow with an enhanced form designer, improvements to make it easier to use, and the ability to use protocol-based reporting and analysis tools directly from a form and the new favorites bar.
I’ve been working on the Pipeline Pilot Chemistry Cartridge and its associated Pipeline Pilot Components. The Pipeline Pilot Chemistry Cartridge is a new Oracle Data Cartridge built using the Pipeline Pilot Chemistry Toolkit. In the weeks leading up to the recent Accelrys European User Group meeting, I was busy timing the creation of a 123 million compound database using a beta version of the Pipeline Pilot Chemistry Cartridge. I never quite made it to my goal of 1 billion compounds in time for the User Group meeting, but in the coming weeks I hope to be able to update you on my attempt to index 1 billion compounds using the actual release version and to provide pointers for indexing your own chemical data.
Among the forward-looking technologies previewed at the European User Group Meeting in Barcelona last month was next generation sequencing. Richard Carter of Oxford Nanopore Technologies overviewed the impressive growth of this technology, which by many accounts is not just outpacing Moore's Law, but making Moore's Law look like it's flatlined! With throughput now doubling every five months or so, bioinformaticians and scientists to find better ways to integrate next generation sequencing data into their existing workflows.
In the video below, Richard provides more information on his talk, which described how Oxford Nanopore Technologies is using the Pipeline Pilot Next Generation Sequencing Collection to translate work from many of the prominent next-generation sequencing publications within Pipeline through relatively trivial protocols.
"What Pipeline Pilot gives us is a platform," Richard says. "I as a bioinformatician can write the protocols and roll that out to the bench scientists so that they can start asking their scientific questions of the data instead of asking the bioinformaticians to solve those problems for them. It's all about empowering your bench scientists. At the moment, next generation sequencing technology is an elite tool. If we can bring that to the masses, so much the better, and especially we believe at Oxford Nanopore Technlogies the combination of simple technology that we're introducing with the benefits and simplicity of Pipeline Pilot will really spread the use of next generation sequencing."
At Accelrys, we believe it’s important to hear from our customers and gain insight on how you use our solutions so that we can improve our products and services. So if you use Pipeline Pilot, either as a user of developed protocols or a developer of said protocols, we invite you to take a short survey on your experiences:
What’s in it for you? A chance to win a free iPad in exchange for about 15 minutes of your time. We’ll select the lucky winner from the survey respondents.
What’s in it for us? Insights into how you use Pipeline Pilot and how the software benefits you and your organization. All responses will be held confidential, though we do plan to aggregate the responses to help customers compare their usage with that of their peers.
The survey will be active until October 15, 2010. I hope you’ll check it out!
The randomized controlled trial is the gold standard for evaluating the effects of an intervention or treatment, whether in a clinical, laboratory, or other setting. A key requirement is to assign experimental subjects to treatment and control groups in a way that maximizes the statistical power of the study while minimizing bias. The most popular method of group assignment is randomization, including variants such as stratified, permuted block, and biased-coin randomization. The latter are designed to get better statistical balance between groups, especially for small trials. An alternative approach with some strong advocates is minimization, which may be partly random or fully deterministic depending on how it is implemented. But might there be a useful third alternative applicable to some types of trials?
The above thought was inspired by the following problem that an Accelrys field scientist presented to me: Given 50 animal subjects with varying body mass, how can they be divided into 5 equal-sized groups such that the body mass mean and variance are roughly the same for each group? In other words, we want the distribution of body mass within each group to be nearly the same. To both of us, this looked like a Pareto problem in which the variance of the within-group mean (variance-of-mean) and the variance of the within-group variance (variance-of-variance) across groups should be simultaneously minimized.
I implemented a simple genetic algorithm in a Pipeline Pilot protocol to do a Pareto optimization for this problem. Here's what the typical results look like in a tradeoff plot showing the Pareto results after 500 iterations, compared to results for randomization and minimization (both constrained to result in equal group sizes):
Variance-of-variance vs. variance-of-mean for alternative methods of assigning subjects to groups
Each color/symbol represents a different approach, with results from Pareto optimization shown as blue circles; results from multiple random partitionings shown as black stars; and results from minimizations with the subjects processed in different random orders shown as red triangles. Observe that for this very simple problem, the Pareto approach gives the lowest variance-of-mean and variance-of-variance values.
Might Pareto optimization be an alternative to traditional randomization and minimization for trial design? I don't know. Given the constraints of the approach, it may be more applicable to lab studies than to clinical trials. But these exploratory results look intriguing. I provide the protocol along with some more thoughts in an Accelrys Community forum posting.
Accelrys was featured this month in a Bio-ITWorld article titled Workflow’s Towering Aspirations. In reporting the challenges facing creators of data pipelining/workflow tools, John Russell raises several questions regarding the role that tools like Pipeline Pilot play in advancing drug discovery.
The central question Russell raises is “how far workflow tools can grow beyond personal productivity instruments into what Frost & Sullivan has termed scientific business intelligence platforms serving the enterprise.” I agree this question is key, but it is sometimes misinterpreted. “Serving the enterprise” is often equated with moving from providing software for an individual scientist to an ill-defined concept of an “enterprise user” and some fuzzy concept of increased organizational performance. We don’t necessarily see it that way.
Let’s say I’m a scientist at a pharma or biotech company, and I create a Pipeline Pilot data analysis protocol to automate and standardize what I previously did manually. That benefits me, but I can also share that protocol with my colleague sitting next to me, and she gains a similar benefit. Well, why should seating proximity determine who gains from my standardized process? In fact I can share my protocol widely with colleagues throughout my organization, across geographies and time-zones. So now we see clearly that the organization benefits both in terms of collective enhanced personal productivity, but also globally, due to the standardized application of analytical methods leading to more consistent data that ultimately supports better decision-making across projects and over time.
This is a key element of what we consider “scientific business intelligence.” More and more, the ability to transfer expertise across the enterprise has become a requirement. Data pipelining and workflow naturally facilitate master data management and good practices for data stewardship that benefit organizations internally and in their necessary collaborations with CROs and academics. These systems help enable organizations define consistent methodologies that can be extended and enforced both across and between organizations to ensure that data sets are comparable regardless of who creates them. Pipeline Pilot also scales while conforming to IT policies around security and authorization, which supports the growing demands for personal productivity on an enterprise scale. It’s this “enterprise readiness” that has made Pipeline Pilot the pipelining tool of choice at 20 of the top pharma companies.
Russell also notes the apparent hype associated with data pipelining/workflow tools. I’m not sure we really see or hear an excessive amount of “hype,” but some of the quotes in the article are telling. “The best possible data management tools would provide 10% of the solution, at most.” Whether this is the right value or not, it is a fact that significant inefficiencies remain in our industry due to the continued use of Excel copy-and-paste and similar manual, repetitive, error-prone processes. As another quote states, “Eventually, workflow tools can and should supplant many data analysis processes that are currently done with a combination of fragile spreadsheets and marginally re-usable macros.” This is consistent with estimates that more than half of a researcher’s time is spent performing these manual, data processing tasks—tasks that are highly amenable to being coded into workflows such as Pipeline Pilot protocols (e.g., R&D Productivity Analysis, Yazdani, Holmes, A.D. Little).
So we shouldn’t be thinking in terms of how much of the business of drug discovery (or other science and technology-based research) can be provided by workflow tools, but rather the extent to which these tools can minimize the time spent by scientists performing routine tasks and free them up to actually innovate and be productive within their organizations. This is, I believe, one of the “towering aspirations” of workflow tools. If we could reduce the time spent in low-productivity tasks to an amount closer to 10%, it would provide a massive benefit to research organizations. Wouldn't this make workflow tools worthy of just a bit of hype?
Offering insight from the perspective of a Pipeline Pilot and Materials Studio user, Accelrys is pleased to host a posting written by guest blogger Dr. Misbah Sarwar, Research Scientist at Johnson Matthey. Dr. Sarwar recently completed a collaboration project focused on fuel cell catalyst discovery and will share her results in an upcoming webinar. This post provides a sneak peek into her findings...
“In recent years there has been a lot of interest in fuel cells as a ‘green’ power source in the future, particularly for use in cars, which could revolutionize the way we travel. A (Proton Exchange Membrane) fuel cell uses hydrogen as a fuel source and oxygen (from air), which react to produce water and electricity. However, we are still some time away from driving fuel cell cars, as there are many issues that need to be overcome for this technology to become commercially viable. These include improving the stability and reactivity of the catalyst as well as lowering their cost, which can potentially be achieved by alloying, but identifying the correct combinations and ratios of metals is key. This is a huge task as there are potentially thousands of different combinations and one where modeling can play a crucial role.
As part of the iCatDesign project, a three-year collaboration with Accelrys and CMR Fuel Cells funded by the UK Technology Strategy Board, we screened hundreds of metal combinations using plane wave CASTEP calculations.
In terms of stability, understanding the surface composition in the fuel cell environment is key. Predicting activity usually involves calculating barriers to each of the steps in the reaction, which is extremely time consuming and not really suited to a screening approach. Could we avoid these calculations and predict the activity of the catalyst based on adsorption energies or some fundamental surface property? Of course these predictions would have to be validated and alongside the modeling work, an experimental team at JM worked on synthesizing, characterizing and testing the catalysts for stability and activity.
The prospect of setting up the hundreds of calculations, monitoring these and then analyzing the results seemed to us to be quite daunting and it was clear that some automation was required to both set up the calculations and process the results quickly. Using Pipeline Pilot technology (now part of Materials Studio Collection) protocols were developed which processed the calculations and statistical analysis tools developed to establish correlations between materials composition, stability and reactivity. The results are available to all partners through a customized web-interface.
The protocols have been invaluable as data can be processed at the click of a button and customized charts produced in seconds. The timesaving is immense, saving days of endless copying, pasting and manipulating data in spreadsheets, not to mention minimizing human error, leaving us to do the more interesting task of thinking about the science behind the results. I look forward to sharing these results and describing the tools used to obtain them in more detail in the webinar, Fuel Cell Catalyst Discovery with the Materials Studio Collection, on 21st July.”