Compared to chemical data, biological data has always been harder to manage electronically. But what exactly makes it so problematic? The major hurdle has to do with vocabulary and definitions: what exactly constitutes a unique therapeutic biomolecule and how to represent it so that it can be stored and searched electronically.
Before you can set up a system that can reliably search for and retrieve a unique substance, you first need to define and describe what you’ve made. And this is harder than it should be. Scientists and research managers I’ve spoken to acknowledge that even within the same organization, different departments use different words to describe the same thing. These “dialects” make it particularly difficult to deal with biologics, which are very often defined and characterized by what they aren’t or how they are made. A sequence of interest may be known and understood by its genealogy with respect to parent sequences or the steps and techniques used to make it. Shaken, not stirred doesn’t just apply to 007—the way a biologic is made and handled is often as important as the specific sequence itself.
Standardized vocabularies for specific modifications and processes can help organizations build consistent recipes to describe the biomolecules they are making. And another thing that will help is the ability to describe electronically the actual, underlying structure of biomolecular components. Chemical nomenclature, standardized through IUPAC and other conventions, provides a “thing” that can be represented and stored and, most importantly, retrieved and linked to associated information. But life scientists have lacked a consistent way to represent biologics, which has led to ambiguous shorthand that can be particularly confusing when it comes to hybrid structures containing both amino acids and nucleotides.
Consider the sequence “GGG.” Any of those Gs could refer to guanine, the nucleic acid building block; glycine, the amino acid; or even guanosine, the nucleoside. There’s no ambiguity at all, though, if the shorthand description simply stands in for the underlying chemical structures for each G, stored electronically and visible with a simple mouse over. Here what that looks like:
Such a system could also help with some dialect issues. For example, if sulfur and sulphur are both linked to the element 16 [S], then searches based on S find the expected hits no matter how the element is named.
Scientists know that a UniProt sequence is more than a list of letters. But for that meaning to not just be useful, but able to protect IP in an electronic environment, that list of letters must be linked inextricably to the underlying chemistry. Only then can the sequence be stored, searched, and modified to uniquely describe biomolecular components.
We think this approach goes a long way toward solving some of the challenges associated with representing biologics, but there’s clearly a lot to consider. What issues have you encountered in trying to describe, store, and search biologics electronically?
The contract research webinar series continues this week with a presentation by Roger Avakian, vice president, scientific development, at PolyOne Corporation, a polymer services company in Avon Lake, OH. His mandate is to develop strategic technology “breakthroughs”—no pressure there! I asked Roger for his perspective on the challenges in nanocomposite development and how modeling can help. Read on for his insights, and I encourage you to check out the recording of the webinar, particularly if your product requires you to put dissimilar things (solids, liquids, or even gases) together to get the best product performance.
The unique properties possible at the nanoscale have the potential to revolutionize materials development in a variety of industries. Yet commercialization of nanocomposites has not quite kept pace with that interest and enthusiasm. Why?
The main challenge in the development of nanocomposites is dispersal—separating the materials with the properties you want so that they can be used to develop products. It’s a challenge that illustrates the pros and cons at the heart of nanotechnology. Nanoscale materials possess unique and exciting properties, but at that scale, materials have such a high surface area that they tend to stick together. The labs that have discovered or initially characterized a material’s nanoscale properties have the luxury of using pretty much whatever it takes to disperse the material. But these techniques often fail to translate to an industrial environment.
At PolyOne, we were interested in developing montmorillonite nanocomposites. Clays have been used as fillers in plastics for years. Exfoliated montmorillonite offers excellent properties: high modulus, but with good impact resistance; flame and heat retardance; and barrier properties that make it very appealing in packaging applications. Manufacturers do some initial separation of nano-montmorillonite, but to put it into a polymer melt, companies working with nano-montmorillonite must consider two things:
1) What compatibilizer to use
2) What processing methods to employ
Making the right choices disperses individual platelets of montmorillonite evenly in your polymer sample. But what are the right choices? Did the polymer melt turn into an unusable gel because of the wrong additive, too much of the right additive, or something that happened during processing? Answering any one of these questions is a research project in and of itself.
Modeling, of course, can help organizations understand the system without investing the hours of synthetic time needed to individually investigate every experimental parameter. But modeling also has a cost, requiring specialized software and expertise that not every organization possesses or wants to invest in. PolyOne, for instance, had no modelers on staff and certainly wasn’t interested in acquiring the software and hiring experts for what was essential a proof of concept.
Outsourcing enabled us to obtain the modeling expertise we needed without investing in tools and expertise we might never use again in-house. It also enabled us to focus our resources on the part of the problem that we understood best: processing. Accelrys’ contract research team, meanwhile, explored the chemistry involved with the different compatibilizer choices. Through our work, we were able to select the best compatabilizer for our application and begin testing different processing techniques faster than we ever would have been able to on our own. More importantly, we were confident that we’d fully explored the impact of various experimental parameters, which we may not have had the time or expertise to do thoroughly without outside help.
While my webinar will specifically discuss our adventures with montmorillonite, the challenges we faced really apply to anyone trying to combine dissimilar materials into a combination that will perform in a controlled, predetermined way. With all the materials we now have to work with, in various forms, formulation is not surprisingly one of the biggest challenges in commercialization. What types of dispersion problems have you faced?
Bottom line: If you secretly feel befuddled regarding what direction to take regarding buying these products. Here’s my advice: ignore the labels. Forget that they’re called “LIMS” or “ELN” or whatever label they are given. Determine what you need and look for that. If you find something that meets your needs, whatever it is called, that is the right product and strategy for you.
Exactly. As the capabilities of laboratory software have begun to converge, the definitions of “LIMS” and “ELN” have blurred. In fact, PharmaIQ emailed me the other day promoting a piece about whether SDMS will replace ELN. Given the alphabet soup of options available, Gloria is right. You have needs. Identify them and then see what vendor or product matches those needs, regardless of its three (or four) letter acronym.
I’d add one caveat: that when you make your decision, you consider not just what is right for you, but for others around you, now and in the future. There’s a great discussion on LinkedIn that explains why. Tools in the electronic lab ultimately have to interface with other apps and across departments. Yet it’s hard sometimes when you’re trying to solve an immediate, pressing problem to think broadly about how (or whether) the solution you choose will play with others.
I can’t find the reference now, but in another LinkedIn discussion, Michael Elliott pointed out that early LIMS adoption wasn’t strategic. Individual labs and project groups purchased the LIMS they liked from the vendor they liked—and the result was often disparate, disconnected systems that organizations are now working to consolidate. Can you really afford to repeat those LIMS-world mistakes, particularly now that there are so many solutions working within the electronic lab? You can avoid downstream consolidation and migration issues by thinking now about how many electronic systems you want to be supporting 5-10 years from now. Is it 1, 2, or 3… or more? Thinking longer-term can help you avoid point solutions tailored to an immediate need in favor of solutions that will scale for future needs and deployment.
One of the questions raised during Millennium’s C&EN webinar is whether (and how) ELNs can supplant entrenched, paper-based workflows in regulated environments. I thought the answer would carry more weight coming from two people who have successfully implemented ELNs in this area. The experiences are excerpted from a case study based on talks from Adele Patterson at Bristol-Myers Squibb and John Leonard from AstraZeneca at last May’s Symyx Symposium.
Adele and John were adamant that paper (and its electronic counterpart, paper on glass) is more a habit than a requirement in regulated environments. Paper notebooks are historically familiar, but were by no means designed as tools for GMP manufacture or GxP analytical R&D or whatever modern, validated workflow you want to name. Paper notebooks had to be adapted to this environment by scientists, quality assurance personnel, and compliance experts. Which means electronic systems can be adapted too, a realization that John says, “motivated all of our stakeholders to collapse our paper paradigms and embrace new electronic ones that would allow seamless transfer of process information through all stages of R&D and ultimately to commercial manufacture.”
Paper, ultimately, is just a tool, and once an organization has conceived of it as such, it can start examining how its workflows and processes might change if another tool was used to achieve the desired end. The changes were more than substantive at both AstraZeneca and BMS—they were revolutionary. AstraZeneca’s simple realization that in the electronic world a process can be described by a collection of documents in an ELN enabled it to shave 50% off the time taken to document experiments. Both organizations ultimately created more streamlined and efficient processes based on best practices. They are actually complying with regulations better than they were when they were using paper, and QA groups at both sites, who were initially skeptics of the ELN, became partners in the implementations and champions of the systems.
Adele noted in an interview recorded at Symyx Symposium that merely reproducing paper “on glass” provides only minimal value in the long run. The extension of the workflow into a fully electronic environment enables organizations to eliminate sources of error, ensure standard processes are followed, remove cycle time delays associated with movement of the paper or its electronic equivalents, and provide a research tool that automatically create visualizations and report alerts so that laboratory teams can quickly see the information they are generating.
“Just because you have always done something a certain way in the paper world doesn’t mean that’s how we have to do it in the electronic world,” Adele says. “Sometimes you have to take a deep breath and step back to determine which steps are really value-added—what is the purpose of a particular step. Our assessment gave us the time to do this analysis and convince all of the stakeholders that we had the right processes in place to stay compliant.”
Open innovation, as defined by H.W. Chesbrough, calls for good ideas to come from both inside and outside a company. Clearly R&D organizations see the value in this advice. Contract research has surged even during the global recession, with one source noting it accounted for 29% of the $74 billion drug development budget in 2008.
But it’s not just in life sciences that contract research has become essential. Critical scientific challenges in a range of industries—chemical, automotive, aerospace, microelectronics, and consumer packaged goods—require new ways of pooling knowledge to better understand how systems fundamentally work. For example, you can see how contract research helped Johnson Matthey discover new fuel cell catalysts by downloading this webinar recording.
In the embed below, Lalitha Subramanian, senior director, fellow, and Accelrys blogger, explains how the Accelrys contract research team works with R&D organizations to extend their internal R&D capabilities. Subramanian will be kicking off a webinar series on contract research on September 15 with an intro to Accelrys contract research. A customer presentation by Roger Avakian of PolyOne Corporation follows on September 28, and on October 7 Johan Carlsson, a member of the Accelrys contract research team, will give an application oriented talk on graphene multiscale simulations. Click here to register or learn more about this webinar series.
This fall’s ACS meeting in Boston seemed lower key than the spring meeting in San Francisco. But maybe it was just that I was insanely busy with Chemical Information Division (CINF) activities, as my term as chair of the division ended at this meeting. I was really thrilled to be able to present Tony (Anton J. Hopfinger) with the 2010 Herman Skolnik award. I do not have a computational chemistry background, but I worked with Tony on my first MDL business development project, and his reasonable, fair, and generous nature left me with a long-term respect and affection for him.
I also quite enjoyed the speaker we had at the CINF luncheon. Michael Capuzzo, former Philadelphia enquirer reporter, joined us to talk about his most recent book, The Murder Room: The Heirs of Sherlock Holmes Gather to Solve the World's Most Perplexing Cold Cases. While, yes, he doesn’t work directly in cheminformatics, Capuzzo painted a picture of information gathering and relationship extraction that mapped very closely to the challenges we information professionals face every day.
Capuzzo noted that over meals not unlike the one we had shared prior to his talk, a team of dedicated and skilled detectives known as the Vidocq Society pore over gruesome crime scene photos of corpses and cannibalism and detailed police investigative reports to try to bring serial killers to justice. Good thing we had finished eating before Michael took to the podium! Michael explained that the Vidocq Society has handled hundreds of cases over the years and made significant contributions leading to new arrests and/or exonerations. Michael noted there is still a big difference between knowing who committed the crime and being able to secure justice—and so Vidocq Society is very strict about only getting involved if family members and the local law enforcement invite them to participate
My colleague Keith Taylor participated in a session dedicated to chemical representation, which Wendy Warr reviewed recently. The session showed that work in this area is by no means finished. One talk covered the recent IUPAC recommendations for chemical structure representation and another covered advances with InChI—both are true community projects that bring together vendors, academics, and publishers. I’ll admit I held my breath when Keith, who spoke on our new flexible sequence representation, launched a live demo—but it worked splendidly, really showing off the flexibility and power of this new approach. You can view Keith’s slides below.
Finally, the ACS exhibition allowed the Accelrys content group to showcase all the database content that is now part of the Accelrys product family. We continue to focus on chemical sourcing, reactions, and bioactivity information, offering multiple in-house and hosted options. We also demoed the All New DiscoveryGate at ACS and are looking for beta testers to come on board as we begin adding reaction content to the sourcing data currently delivered through that system. If you’d like to participate, please email email@example.com or comment on this post.
The day began with some really complex structures from Mark Andrews of DuPont. It reminded me a bit of the good old days of polymer representation at MDL before MDL shifted focus largely to life sciences--all those brackets and attached data and stuff--but things have moved on since then, and DuPont have done painstaking work on “no structures” and mixtures and all sorts of polymers. Most impressive. In contrast, Jonathan Brecher’s structures were “beautiful”. Jonathan has ensured that ChemDraw structures follow IUPAC’s standards for graphical representation of chemical structure diagrams.
Steve Heller gave a status report on InChI and InChIKey. There was not a lot of new material in his talk since the one he gave at the spring ACS meeting, but I was intrigued that Steve was much lower key, much less challenging than in previous talks. I got the feeling that InChI has now entered the mainstream.
Keith Taylor described Accelrys’s hybrid representation for biological sequence information. It is a “best of both worlds” approach combining the compactness of a sequence with detailed connectivity information for modified regions. While Symyx was beavering away on self-contained sequence representation (SCSR), Accelrys was touting BioReg, its biological registration system. Now that the two companies are united, there could be genuine synergies between BioReg and SCSR, with resultant user benefits.
Szabolcs Csepregi talked about Markush structures: I won’t say more since I hope that anyone interested will be able to find most of this on the ChemAxon website. The work is not new to me but Molecular Networks’ work was. Christof Schwab described CSRML, a new markup language for chemical substructure representation. At first I wondered why we need this if we trust chemical markup language (CML) but I learned that CML does not handle query structures.
My apologies to anyone who reads this post and wonders why I did not mention his or her talk. And anyone who wants lots of detail can always buy my report on the meeting, which I'll be publishing in January.