Skip navigation

Mike Elliott’s most recent article in Scientific Computing provides an excellent overview of agile development, the software roll-out strategy that is becoming a de facto industry practice. Mike correctly concludes that iterative, time-based methodologies like agile are essential to help organizations manage risk, demonstrate success, and clear users’ resistance barriers. But just using agile won’t guarantee success: visionary management and a solid implementation strategy must be in place to avoid ad hoc, unproductive execution.

The biggest impact on an agile roll-out comes from two things: identifying the right stakeholders and the right sequence of iterative steps (or “sprints,” to use the agile vocabulary). To determine which scientists to roll an ELN out to first, we advise dividing potential users into the following groups:

  • Scientists who could support a “quick win” ELN deployment

  • Strategic scientists who offer an opportunity to deliver high value productivity gains from ELN use

  • All other paper notebook users, prioritized if necessary by the size of the implementation, the length of the implementation, and the value of their organizational role and anticipated ELN use

Next, define short sprints, each with a clear, demonstrable, well articulated deliverable. It’s essential that you build in time to stop at the end of each sprint, assess progress and feedback, and use this assessment to define the strategy and goal of ensuing sprints. This iterative stop, listen, learn, and improve approach ensures the right configuration is rolled out in the right way to scientists—and it takes less time than pressing ahead with something less than optimal.


So what might an actual agile ELN development look like? You might start with a few quick wins among the first group of scientists. This has two effects. First, you can identify champions who will promote the ELN to their peers. Second, it prevents strategic scientists from getting distracted by early “teething” issues—the quick win opportunities will usually work out these kinks. Using the lessons learned from the quick wins, you can develop some high value productivity sprints engaging the strategic scientists. Then, as time, budget, and priorities allow, you’ll roll the ELN out to the rest of the organization.


Ben Lass has provided a great series covering ELN implementation from start to finish that offers more details on the approach I’ve described. Just one more point: When it’s necessary to deliver an ELN in functional phases,I recommend thinking in terms of paperless vs. convergent. Begin by moving all scientists to a paperless paradigm. Most vendors offer ELNs that can deliver a paperless environment with minimal configuration and extension. Once you’re paperless, begin integrating existing instruments and software into the ELN to address specific scientific workflows. This, of course, requires that you select an ELN vendor that can handle this phased approach—one that offers an ELN that provides both a paperless option and hooks to other systems in the larger electronic lab environment.

396 Views 0 References Permalink Categories: Electronic Lab Notebook, The IT Perspective Tags: eln, workflow-integration, agile-development

The randomized controlled trial is the gold standard for evaluating the effects of an intervention or treatment, whether in a clinical, laboratory, or other setting. A key requirement is to assign experimental subjects to treatment and control groups in a way that maximizes the statistical power of the study while minimizing bias. The most popular method of group assignment is randomization, including variants such as stratified, permuted block, and biased-coin randomization. The latter are designed to get better statistical balance between groups, especially for small trials. An alternative approach with some strong advocates is minimization, which may be partly random or fully deterministic depending on how it is implemented. But might there be a useful third alternative applicable to some types of trials?


The above thought was inspired by the following problem that an Accelrys field scientist presented to me: Given 50 animal subjects with varying body mass, how can they be divided into 5 equal-sized groups such that the body mass mean and variance are roughly the same for each group? In other words, we want the distribution of body mass within each group to be nearly the same. To both of us, this looked like a Pareto problem in which the variance of the within-group mean (variance-of-mean) and the variance of the within-group variance (variance-of-variance) across groups should be simultaneously minimized.


I implemented a simple genetic algorithm in a Pipeline Pilot protocol to do a Pareto optimization for this problem. Here's what the typical results look like in a tradeoff plot showing the Pareto results after 500 iterations, compared to results for randomization and minimization (both constrained to result in equal group sizes):



Variance-of-variance vs. variance-of-mean for alternative methods of assigning subjects to groups



Each color/symbol represents a different approach, with results from Pareto optimization shown as blue circles; results from multiple random partitionings shown as black stars; and results from minimizations with the subjects processed in different random orders shown as red triangles. Observe that for this very simple problem, the Pareto approach gives the lowest variance-of-mean and variance-of-variance values.


Might Pareto optimization be an alternative to traditional randomization and minimization for trial design? I don't know. Given the constraints of the approach, it may be more applicable to lab studies than to clinical trials. But these exploratory results look intriguing. I provide the protocol along with some more thoughts in an Accelrys Community forum posting.

511 Views 0 References Permalink Categories: Data Mining & Knowledge Discovery Tags: statistics, optimization, pipeline-pilot, experimental-design

ELNs in Regulated Environments

Posted by dcurran Aug 27, 2010

We had a lot of questions during Millennium’s C&EN webinar about the use of ELNs in regulated environments. This question is even more relevant given the release this week of Symyx Notebook by Accelrys 6.5, which offers improvements specifically for chemical process development and scale up. Here Michael Weaver, principal with The Weaver Group, a consultancy specializing in workflow integration in the QA and regulatory and process analytical areas, comments on the specific challenges of using an ELN in regulated environments. He’ll be offering further insights in future entries in the coming weeks.


mike weaver_small.jpgIn physics, scientists often refer to a “theory of everything” that seeks to combine the seemingly different laws of the large scale of general relativity and the minute scale of quantum mechanics. Similarly, IT professionals in pharma dream of being able to unify the seemingly conflicting requirements of their R&D divisions and manufacturing areas under the umbrella of a single informatics solution.


Will this ever be possible? With the help of ELN software vendors, I think so.


Dealing with regulatory compliance issues is the main reason that pharma has been considered behind the curve when it comes to developing and adopting innovative informatics. It’s not only hard to implement electronic systems that effectively maneuver the hurdles regulations create, but easier and less risky to play catch up after someone else has adopted a new technology.


Within this landscape of extreme caution, pharma has been implementing ELNs, particularly in research, where compliance issues are less onerous. They’ve seen gains in productivity and collaboration. Moreover, electronic data has been proven to be safe, secure and able to withstand the legal requirements of patent protection and proof of discovery.


The experience with ELNs in research has led pharma to wonder whether ELNs could also be used to increase productivity in areas such as manufacturing and quality assurance. But can ELNs improve business functions while maintaining compliance?


Laboratories that perform quality assurance functions for pharmaceutical manufacturing must follow Good Manufacturing Practices (GMP), which are enforced by the FDA. This means organizations wanting to use an ELN in these areas must evaluate whethers ELNs will meet these GMP requirements and ensure compliance.


One of these requirements is computer system validation. Organizations must validate any software used in a GMP environment. Validation can be quite a labor intensive and expensive part of the implementation, and it helps greatly if the ELN vendor is familiar with validation needs. In fact, some vendors may even be able to supply validation packages with their product. Additionally, a vendor audit is often performed as part of the validation process, so it is important to make sure that the ELN vendor is comfortable with supplier audits from regulated companies and can provide certifications and access to internal software development procedures if required.


Another area of concern for IT professionals in regulated areas is 21 CFR Part 11, the FDA rule covering electronic records/electronic signatures. Although this guideline was first introduced over a decade ago, many organizations still question the full extent and scope of these guidelines. Obviously, though, an informatics system such as an ELN that gathers and reports quality assurance data must adhere to e-record and e-signature guidelines. For instance, there must be a complete audit trail of the data so that any changes made are fully traceable (in other words, every change to data must indicate the who, what, why, and when associated with the change).


There is no reason that a quality assurance area should have to sacrifice the flexibility and collaboration enjoyed by their counterparts in research in order to achieve compliance. I don’t think it was ever the intention of the FDA to stifle innovation and technology. Although the regulatory hurdle can be challenging, companies bold enough to take on the challenge should see not just the obstacle itself, but the competitive advantage an ELN investment achieves.

432 Views 0 References Permalink Categories: Electronic Lab Notebook Tags: workflow-integration, guest-authors, validated-workflows, process-chemistry, regulated-workflows
It’s very fitting that the 6.5 version of Symyx Notebook by Accelrys—the first product released since the merger of Symyx and Accelrys—spans many of the sectors our combined company now touches. We’re delivering improvements supporting chemical process development, production scale-up, and administrative compliance for scientists working in pharmaceuticals, fine chemicals, agrochemicals, and consumer products. 

The 6.5 release continues to add specific domain functionality and general core support that lets Symyx Notebook by Accelrys find productive use in a range of labs. In fact, research scientists and their daily activities in the lab act as key constituents and inspirations for our notebook team. And we are seeing broad use of our section templates in analytical, bioanalytical, and biology, in addition to our existing base in discovery chemistry. 

The new functionality in this release is focused specifically on process chemists, who work in areas that are strictly regulated and subject to stringent quality assurance (QA) oversight. The greatest challenge faced by process chemists is identifying, optimizing, and delivering the best process to pilot and manufacturing plants in the shortest time possible while complying effectively and efficiently with regulations and QA mandates. 

As described in the press release, the 6.5 version of Symyx Notebook by Accelrys offers an array of new features requested by customers already working in regulated environments. Crucially, the system offers the ability to search experimental data stored in most legacy Symyx notebooks. Such functionality is clearly valuable to those who have these legacy Symyx notebooks. But the underlying API also provides an opportunity to develop a single portal to any ELN data—whether that ELN is from Symyx or another ELN vendor. 

We’ve put up a lot of information at We hope you’ll check it out. In an upcoming post, I’ll summarize a case study written about ELN use in regulated environments at AstraZeneca and Bristol-Myers Squibb. As we have completed the release of 6.5 and work on the deliverables for the next release as well as our company integration project, we have two things in mind: strengthening the data capture that is important to our ELN and providing advanced analytics through our combined product collections. This is a very exciting time to be part of the notebook product team at Accelrys.
443 Views 0 References Permalink Categories: Electronic Lab Notebook Tags: eln, symyx-notebook-by-accelrys, validated-workflows, process-chemistry, regulated-workflows

Today we announce the first major product release since the merger of Accelrys and Symyx. Fittingly, it’s a product that spans many of the sectors our combined company now touches. The latest 6.5 version of Symyx Notebook by Accelrys delivers improvements supporting chemical process development, production scale-up, and administrative compliance for scientists working in pharmaceuticals, fine chemicals, agrochemicals, and consumer products.


The new functionality in this release is focused specifically on process chemists, who work in areas that are strictly regulated and subject to stringent quality assurance (QA) oversight. The greatest challenge faced by process chemists is identifying, optimizing, and delivering the best process to pilot and manufacturing plants in the shortest time possible while complying effectively and efficiently with regulations and QA mandates.


As described in the press release, the 6.5 version of Symyx Notebook by Accelrys offers an array of new features requested by customers already working in regulated environments. Crucially, the system offers the ability to search experimental data stored in most legacy Symyx notebooks. Such functionality is clearly valuable to those who have these legacy Symyx notebooks. But the underlying API also provides an opportunity to develop a single portal to any ELN data—whether that ELN is from Symyx or another ELN vendor.


Visit to learn more about how the 6.5 release of Symyx Notebook by Accelrys supports process chemists. Representatives from AstraZeneca and Bristol-Myers Squibb discuss their use of an ELN in process chemistry in a downloadable PDF and a short video recorded at the Symyx Symposium. You’ll also find an informative slideshow describing the release. We also encourage you to check out our sibling blog, Symyx’s Life in the Electronic Lab, which will feature several posts over the coming weeks on ELN use in process chemistry and the 6.5 version of Symyx Notebook by Accelrys.

426 Views 0 References Permalink Categories: Electronic Lab Notebook, News from Accelrys Tags: eln, symyx-notebook-by-accelrys, process-chemistry

Counting up the compounds in a database should be as easy as, well, 1, 2, 3… But a recent thread on the Chemical Information Sources Discussion List (login required) pointed out that what counts is how (and what) vendors count. In this post, I'd like to comment on this confusion and explain how we count compounds for our premiere sourcing databases: the Available Chemicals Directory (ACD) and the Screening Compounds Directory (SCD).


The discussion on CHMINF is far from the first time that scientists have complained about vendors providing apparently misleading database counts. That’s because of the perception in the database market that size matters. If you’re selling a database, it looks good to boast that you have more compounds or reactions or suppliers than everyone else. So it shouldn’t be surprising that vendors will jockey for position.


As this thread indicated, so much goes into a compound database that it’s possible to count in many different ways, which is why things get so confusing. In sourcing, for instance, a chemical supplier may sell a given unique chemical as many different products (HPLC grade, reagent grade, etc.) and offer various packages of each (10 g, 500 g, 1 Kg, etc.). So as a database vendor, do you size up your database by counting the number of unique chemicals, or do you count each instance of that chemical separately, or do you count each package? All would be accurate! But for any comparisons between databases to be accurate, customers need to know what is being counted.


Sourcing databases present another challenge in the numbers game as well. If a compound was available once, but isn’t available now, should it still count? With ACD, our intent is to emphasize the word available. Sure, we have a few older catalogs in the database, but those are the exception. Our goal is to provide the most comprehensive database of available chemicals. So for us, size may not matter as much as utility.


So with that said, our aim is to provide all the relevant statistics about what our databases contain so that you can see exactly what you are getting. Here is where ACD and SCD stand today:


ACD (2010.07)


Chemicals                             2,962,241


Products                               6,037,389


Packages                               13,929,578


Catalogs                               897


New Chemicals                    286,407


New Catalogs                       5


Updated Catalogs                77


SCD (2010.02)


Chemicals          9,648,480


Products           20,990,755


Packages           26,114,574


Vendors            53


With these numbers in hand you should be able to make the comparisons yourself.

321 Views 0 References Permalink Categories: Scientific Databases Tags: content-in-context, available-chemicals-directory, chemical-sourcing

Molecular and materials modeling has long been successful in areas such as catalysis, polymers, coatings, adhesives, and semiconductors. But did you know about all the applications in areas such as perfumes, chocolate, or cosmetics? These are just different types of materials after all, so it shouldn't  surprise anyone that Accelrys announced that it’s making inroads into the consumer packaged goods sector.

Over the past year several of my fellow scientists and I have been investigating the ways that both modeling and informatics solutions can be applied to CPG R&D. There are some great write-ups out there:

Also note Dr. Felix Grant's article Material Values in Scientific Computing World. And check out the great image on the cover.


What’s fueling this recent surge of scientific informatics and modeling into CPG? Like so many other R&D-based sectors, CPG has been challenged to remain competitive while holding down costs — and nothing says “take my money” like a wrinkle cream that really works, and is sold over the counter at the local pharmacy! As the articles I've cited illustrate, predictive analytics can help R&D teams find answers faster and for less $$ than can experimentation alone – a lesson that sectors such as Pharmaceuticals learned quite some time ago. Add to this the that predictive (molecular) modeling methods have become easier to use and are increasingly merged with informatics, and you gain unprecedented R&D capabilities.


Modern pharmaceuticals are extensively modeled before chemicals are ever mixed in the lab. Maybe the next great perfume will come out of a computer simulation, too.

353 Views 0 References Permalink Categories: Materials Informatics Tags: cosmetics, publications, consumer-packaged-goods, fragrances, beverages, flavors, food

What are critical problems in alternative energy research? How does modeling play a role in bringing us closer to answers?


A recent review article on this topic by long-time associate Prof. Richard Catlow, et. al, caught my attention. Readers of this blog will be familiar with our many posts pertaining to 'green chemistry,' sustainable solutions, and the like. Last month, Dr. Misbah Sarwar of Johnson Matthey was featured in a blog and delivered a webinar on the development of improved fuel cell catalysts. Dr. Michael Doyle has written a series on sustainability. Drs. Subramanian and Goldbeck-Wood have also blogged on these topics, as have I. All of us share a desire to use resources more responsibly and to ensure the long-term viability of our ecosphere. This will require the development of energy sources that are inexpensive, renewable, non-polluting, and CO2 neutral. Prof. Catlow provides an excellent overview on the applications of molecular modeling to R&D in this area. Read the paper for a very comprehensive set of research problems and case studies, but here are a few of the high points.


  • Hydrogen production. We hear a lot about the "hydrogen economy," but where is all this hydrogen going to come from? Catlow's review discusses the generation of hydrogen from water. Research challenges include developing photocatalysts capable of splitting water using sunlight.

  • Hydrogen storage. Once you've created the hydrogen, you need to carry it around. Transporting H2 as a compressed gas is risky, so most solutions involve storing it intercalated in a solid material. LiBH4 is a prototypical example of a material that can reversibly store and release H2, but the process is too slow to be practical.

  • Light absorption and emission. Solar cells hold particular appeal, because they produce electricity while just sitting there (at least in a place like San Diego; I'm not so sure about Seattle). One still needs to improve conversion efficiency and worry about manufacturing cost, ease of deployment, and stability )with respect to weathering, defects, aging, and so forth).

  • Energy storage and conversion. Fuel cells and batteries provide mobile electrical power for items as small as hand-held devices or as large as automobiles. Catlow and co-workers discussed solid oxide fuel cells (SOFC) in their paper.


The basic idea with modeling, remember, is that we can test a lot of materials for less cost and in less time than with experiment alone. Modeling can help you find materials with the optimal band gaps for capture generation of photoelectric energy. It can tell us the thermodynamic stability of these new materials: can we actually make them and will they stick around before decomposing.


Simulation might not hit a home run every time, but if you can screen out, say, 70% of the bad leads, you've saved a lot of time and money. And if you're interested in saving the planet, isn't it great if you can do it using less resources?


Check out some of my favorite resources on alternative energy, green chemistry, and climate change.


514 Views 0 References Permalink Categories: Materials Informatics, Modeling & Simulation Tags: catalysis, materials, atomic-scale-modeling, alternative-energy, green-chemistry, fuel-cells, hydrogen-storage, solar-cells
Food, cosmetics, personal and home care industries seem to be finding a veritable goldmine for differentiating their products and winning consumer loyalty by using distinctive, sensory and even therapeutic scents.  A concomitant trend that focuses on healthy, natural and science-based efficacy of over-the-counter products is resulting in some interesting new options for consumers and a flurry of “new” product formulations.  For example, a scent based on the smell of fresh cut grass was recently introduced which reduces the damage that long term stress can have on the body, including the negative effects on long term memory, by reducing the structural changes that occur in the part of the brain associated with memory and spatial orientation. 

A big challenge for the fragrance and flavors industry, not unlike many others, is that regulatory changes continue to reshape how business is done. As pricing pressures on raw materials continue to drive up prices companies need every possible means to drive smarter, more efficient innovation.  In the midst of this, these companies must focus on those areas that their clients need to compete and grow – one such way is through more creative, differentiating flavors and fragrances. 

Globalization and emerging markets such as China, India, Latin America, continue to provide new opportunities for growth.   However, to succeed in these emerging markets one cannot just reuse successful existing flavors and fragrances since the consumers will have a different aesthetic taste.  Continuous innovation is needed to be successful in these global markets.  It is a big mistake to think that every chemical and formulation has already been invented.  Smarter, faster and more rational design of new compounds is necessary—and possible---for thriving and growing.  Following in the footsteps of the life science industry, who have been using predictive techniques for years, more and more consumer products companies are turning to predictive science to achieve their innovation goals and stay in the game (click here to see my article in Perfumer and Flavorist).
326 Views 1 References Permalink Categories: Data Mining & Knowledge Discovery, Modeling & Simulation Tags: cosmetics, publications, predictive-science, consumer-packaged-goods, formulations, fragrances
Accelrys was featured this month in a Bio-ITWorld article titled Workflow’s Towering Aspirations. In reporting the challenges facing creators of data pipelining/workflow tools, John Russell raises several questions regarding the role that tools like Pipeline Pilot play in advancing drug discovery.

The central question Russell raises is “how far workflow tools can grow beyond personal productivity instruments into what Frost & Sullivan has termed scientific business intelligence platforms serving the enterprise.” I agree this question is key, but it is sometimes misinterpreted. “Serving the enterprise” is often equated with moving from providing software for an individual scientist to an ill-defined concept of an “enterprise user” and some fuzzy concept of increased organizational performance. We don’t necessarily see it that way.

Let’s say I’m a scientist at a pharma or biotech company, and I create a Pipeline Pilot data analysis protocol to automate and standardize what I previously did manually. That benefits me, but I can also share that protocol with my colleague sitting next to me, and she gains a similar benefit. Well, why should seating proximity determine who gains from my standardized process? In fact I can share my protocol widely with colleagues throughout my organization, across geographies and time-zones. So now we see clearly that the organization benefits both in terms of collective enhanced personal productivity, but also globally, due to the standardized application of analytical methods leading to more consistent data that ultimately supports better decision-making across projects and over time.

This is a key element of what we consider “scientific business intelligence.” More and more, the ability to transfer expertise across the enterprise has become a requirement. Data pipelining and workflow naturally facilitate master data management and good practices for data stewardship that benefit organizations internally and in their necessary collaborations with CROs and academics. These systems help enable organizations define consistent methodologies that can be extended and enforced both across and between organizations to ensure that data sets are comparable regardless of who creates them. Pipeline Pilot also scales while conforming to IT policies around security and authorization, which supports the growing demands for personal productivity on an enterprise scale. It’s this “enterprise readiness” that has made Pipeline Pilot the pipelining tool of choice at 20 of the top pharma companies.

Russell also notes the apparent hype associated with data pipelining/workflow tools. I’m not sure we really see or hear an excessive amount of “hype,” but some of the quotes in the article are telling. “The best possible data management tools would provide 10% of the solution, at most.” Whether this is the right value or not, it is a fact that significant inefficiencies remain in our industry due to the continued use of Excel copy-and-paste and similar manual, repetitive, error-prone processes. As another quote states, “Eventually, workflow tools can and should supplant many data analysis processes that are currently done with a combination of fragile spreadsheets and marginally re-usable macros.” This is consistent with estimates that more than half of a researcher’s time is spent performing these manual, data processing tasks—tasks that are highly amenable to being coded into workflows such as Pipeline Pilot protocols (e.g., R&D Productivity Analysis, Yazdani, Holmes, A.D. Little).

 So we shouldn’t be thinking in terms of how much of the business of drug discovery (or other science and technology-based research) can be provided by workflow tools, but rather the extent to which these tools can minimize the time spent by scientists performing routine tasks and free them up to actually innovate and be productive within their organizations. This is, I believe, one of the “towering aspirations” of workflow tools. If we could reduce the time spent in low-productivity tasks to an amount closer to 10%, it would provide a massive benefit to research organizations. Wouldn't this make workflow tools worthy of just a bit of hype?
463 Views 0 References Permalink Categories: Lab Operations & Workflows, Bioinformatics, Cheminformatics, Materials Informatics, Executive Insights Tags: pipeline-pilot, data-management, business-intelligence, data-pipelining
According to Gabriel Weatherhead, lead systems engineer at Millennium: The Takeda Oncology Company, the real value of Symyx Notebook is that it offers an overall platform that lets R&D informatics teams give scientists what they want. [Editor's note: In July 2010 Symyx merged with Accelrys, Inc.]

Gabe made these comments to 438 participants in an American Chemical Society C&EN webinar on Thursday, July 22. A recording of the webinar is now available for download, and you can also read a write up of the implementation.

Building applications that scientists will like and use is at the core of research IT’s mission, but that challenge was amplified at Millennium when it chose to implement an ELN across 10 different biological departments. The workflows and requirements ranged widely, Gabe explained. “Some scientists wanted a blank page that they could fill with information, others wanted stuff pulled in automatically off instruments and fields calculated automatically. Symyx Notebook provides an overall platform where you can build in hooks or just deliver something simple out of the box—whatever it is that scientists want.”

Another big change between the biology implementation completed this year and the chemistry implementation that Millennium started in 2003 was user expectations. “In the age of iPad and Windows 7, users expect more from software,” said Gabe. “They expect things to be streamlined, with shiny buttons. Today’s applications have to do as much of the grunt work as possible behind the scenes. This puts a lot of pressure on the implementation team to provide real value.”

Gabe said that it’s difficult to determine ROI for an application like an ELN. One measure of success for them is that their lab record keeping has been entirely electronic for four years. Productivity-wise, the ability to capture and search information previously bound in paper notebooks—sometimes illegibly—has helped Millennium protect and defend its intellectual property. Additionally, at least 50% of Millennium’s records are cloned from previous records. “The ability to document and set up a reaction has become trivial,” said Gabe, who was himself a scientist at Millennium before moving into research IT. “With the ELN, I could easily set up more experiments than I could work up.”

Gabe was only able to address five of the 145 questions asked by webinar participants, so watch this space—we plan to address some of the most popular questions in future posts.
457 Views 3 References Permalink Categories: Electronic Lab Notebook Tags: case-studies, webinars, eln, symyx-notebook-by-accelrys