Compared to chemical data, biological data has always been harder to manage electronically. But what exactly makes it so problematic? The major hurdle has to do with vocabulary and definitions: what exactly constitutes a unique therapeutic biomolecule and how to represent it so that it can be stored and searched electronically.
Before you can set up a system that can reliably search for and retrieve a unique substance, you first need to define and describe what you’ve made. And this is harder than it should be. Scientists and research managers I’ve spoken to acknowledge that even within the same organization, different departments use different words to describe the same thing. These “dialects” make it particularly difficult to deal with biologics, which are very often defined and characterized by what they aren’t or how they are made. A sequence of interest may be known and understood by its genealogy with respect to parent sequences or the steps and techniques used to make it. Shaken, not stirred doesn’t just apply to 007—the way a biologic is made and handled is often as important as the specific sequence itself.
Standardized vocabularies for specific modifications and processes can help organizations build consistent recipes to describe the biomolecules they are making. And another thing that will help is the ability to describe electronically the actual, underlying structure of biomolecular components. Chemical nomenclature, standardized through IUPAC and other conventions, provides a “thing” that can be represented and stored and, most importantly, retrieved and linked to associated information. But life scientists have lacked a consistent way to represent biologics, which has led to ambiguous shorthand that can be particularly confusing when it comes to hybrid structures containing both amino acids and nucleotides.
Consider the sequence “GGG.” Any of those Gs could refer to guanine, the nucleic acid building block; glycine, the amino acid; or even guanosine, the nucleoside. There’s no ambiguity at all, though, if the shorthand description simply stands in for the underlying chemical structures for each G, stored electronically and visible with a simple mouse over. Here what that looks like:
Such a system could also help with some dialect issues. For example, if sulfur and sulphur are both linked to the element 16 [S], then searches based on S find the expected hits no matter how the element is named.
Scientists know that a UniProt sequence is more than a list of letters. But for that meaning to not just be useful, but able to protect IP in an electronic environment, that list of letters must be linked inextricably to the underlying chemistry. Only then can the sequence be stored, searched, and modified to uniquely describe biomolecular components.
We think this approach goes a long way toward solving some of the challenges associated with representing biologics, but there’s clearly a lot to consider. What issues have you encountered in trying to describe, store, and search biologics electronically?
One of the most talked about scientific publications this year described the creation of a bacterial cell controlled by a synthetic genome (Gibson et al., 2010). The team that conducted the research, based at the Craig Venter Institute, synthesized the modified genome sequence of one species of Mycoplasma (about one million base pairs) and successfully transplanted it into the cell of another Mycoplasma species. The phenotype of the resulting bacteria was exactly as expected and the cells were shown to be capable of self-reproduction.
The paper sparked a series of debates about the significance of this particular experimental strategy and to what extent it constituted the creation of synthetic life. Additionally, and of more practical interest, the efficiency of this de novo strategy was also widely debated. While there is relatively little dispute about the potential value of engineering new biological systems, the work fuelled an ongoing controversy about the efficiency of such an empirical approach to creating a new biological compared to more conventional strategies that modify existing genomes in their native cellular environments.
But the study also emphasized some more specific challenges beginning to emerge as genome sequencing establishes itself as a mainstream tool in the Life Sciences. Of the many technical obstacles that the team overcame in creating the new cell, one of the most surprising was the impact of seemingly trivial error rates on the successful creation of the synthetic genome sequence. A single base pair deletion in the dnaA gene, involved in chromosome replication, rendered the transplanted cells unviable. The failure to identify this error during quality control sequencing of the synthetic genome significantly delayed the completion of the project. As soon as this one-in-a-million error was identified and rectified in the synthetic sequence, the team was able to successfully recover viable cells.
In this case, the significance of the sequencing error in determining the synthesized genome sequence was obvious, as it resulted in an effectively lethal genotype. However, more generally, it emphasizes the critical need for highly accurate sequence determination. As the pharmaceutical industry increasingly relies on genome sequence data as a foundation of personalized medicine, the accuracy of the genotypic data collected on a large scale will come sharply to focus. Being able to determine sequences with 100% accuracy, particularly in non-coding regions of genomes, may become a challenging pre-requisite to personalized therapeutic strategies.
Accelrys is actively working on products, such as the Next Generation Sequencing (NGS) component collection and a biological registration system that can play a critical role in quality control of biological data. In the case of the NGS collection, alongside the integration of the latest mapping and assembly algorithms, the core data pipelining capabilities of the Pipeline Pilot platform make it possible for scientists to develop quality control pipelines without any programming knowledge. As applications of genome sequencing such as synthetic biology and personalized medicine progress, such computational approaches to quality control will play an increasingly central role.
Recently I was reading a very interesting article on the rise of Roundup-resistant weeds. This, as a person who likes corn based products, is quite important. However, unlike my recent posts on Biochemistry and the development of bio-crude and biomass conversion systems, this is about the convergence of chemistry, environmental fate and toxicology and genetic biology.
What I mean is, as the threat from large scale current generation resistant pesticides grows, some farmers are predicting cotton and soy crops having 30%+ weed content in less than five years. This means, just to sustain our current food production levels and efficiencies, we will have to both develop new compounds, but also develop herbicide resistant variants of older compounds and gene lines. This older is new again answer to sustaining our food pipeline is somewhat ironic, but it is clear that all the major agrochemical companies are approaching this and that the next generation choice of chemicals will be governed by the development of seed lines that are tolerant to the materials in use.
Interestingly, this also creates potential captive farmers who gear the agricultural business to a specific seed class and so are locked into a specific herbicide regime and process. What is clear to me is that the need to analyze biological entities along-side chemical, Genomics, environmental and other complex data types will require a very flexible, extensive data platform and a unique type of registration engine.
The inherently complex nature of cell lines, plasmids, proteins, antibodies and vaccines makes a biological registration system challenging. Yet such systems are needed so that researchers and companies can track these entities and their relationships, creating critical intellectual property positions as well as connections to past research and manufacturing processes.
Patterned on the services of registration systems for chemical entities, which are well-known and entrenched in the drug discovery process, the Accelrys Biological Registration system is an "intelligent" solution for registering, associating, searching and retrieving data for entities such as siRNA, plasmids, cell lines, proteins, antibodies, vaccines and future biological entities.
Join us on Wednesday, May 26 for our live webinar, “Intro to Accelrys Biological Registration,” the first in a series on biological registration. To register or learn more, please click here.
Accelrys has announced the commercial availability of theaward winningAccelrys Biological Registration, the world's first multi-entity, fully-integrated, flexible and extensible database for biological entities. The enterprise-scalable system supports eight major biological entities: Yeast, Cell Lines, DNA, Protein, Plasmid, Vaccine, Antibody and siRNA.
Although registration has been a standard in small molecule researchfor many years, it is relatively new to biological sciences. With the advent ofAccelrys Biological Registration, companies with innovation in biology can now:
Implement and automate the process of biological entity registration
Capture, secure and protect corporate intellectual property
Search, retrieve and utilize biological information across the enterprise
Enable scientists to collaborate and share biological information
Increase operational efficiencies and reduce costs through registering biological entities
Reduce the risks associated with failing to register biological entities
This product was developed in a pre-competitive environment with several of the world's leading biopharmaceutical companies, including Abbott Laboratories and Merck & Co., Inc. With the release of Accelrys Biological Registration, we are further demonstrating our commitment to ongoing innovation in the scientific informatics market.
We invite you tolearn moreabout Accelrys Biological Registation:
Registerfor our Accelrys Biological Registration webinar series
Biology is undergoing a revolution and is becoming a more analytical science with the advent of omics, high content screening, next generation sequencing, and other methods. These methods lead to the more in-depth understanding of systems biology and the discovery of new biomarkers. This greater understanding can be used to fill-in our knowledge about pathways, to the point of building mathematical models of the multiple processes involved in any response to stimuli. All of this taken together should increase the odds of success by having better information to base decisions on.
The other area in biology that has great opportunities is in the use of biologics as drug entities. These drug entities range in complexity from antibodies, vaccines, siRNA, etc. The value to the marketplace is in the hundreds of billions of dollars and intellectual property (IP) protection is essential. Some of the largest patent infringement cases ever awarded are around biologics.
In the process of building-out of these analytical biology systems, and the biologics as drug entities, there are many biological innovations and inventions. For example, new stem cell lines, antibody generation as a tool or as a drug entity, plasmids, algae strains, etc. There is also a lot more inventory, reagents and data to track today than ever before. The best way to track data across multiple sources is the use of a consistent and meaningful key. The way this is handled in the chemical space is to use a registration system to uniquely identify an entity and give the entity a unique integer that represents the entity in every data system. Until recently, the biologist might track a bar code for a cell line or antibody in their notebook, and a possible location for this entity in a lab-based, simple inventory system. However, this type of system only tells the same researcher where the cell line is, not uniquely what it is. In order for the entire company to benefit from the inventory, and protect their IP, there is the need for describing the biological entity uniquely. This is a rather new concept for biologist which needs to be carefully considered moving forward to better protect IP, manage expensive reagents, implement safety systems and most importantly, to ensure the query and aggregation of data. All of this has been implemented in chemistry and shown to be of great value, now it is biology’s turn.
Does your organization have a Biological Registration system? How could such a system add value to your organization?
Learn how Accelrys is on the forefront of scientific innovation by being one of the first to preview our BioIT World award-winning application, Accelrys Biologics Registration. Developed with leading pharmaceutical companies, the application was designed to address the challenges posed by the dynamic nature of biological entities.
Get a glimpse into the latest release of Pipeline Pilot and the Imaging Collection; or hear how Accelrys products are being used to address next-generation sequencing analysis challenges by attending “Pipelining Your Next Generation Sequencing Data,” on Wednesday, April 21, 12:00pm in Track 3.
Visit us at booth #301-303 to learn more about our leading scientific informatics solutions.
Accelrys is the official Twitter sponsor for BioIT World Conference & Expo ’10, follow us (#BioIT10) for your chance to win an Apple iPad.
Accelrys has recently concluded a series of meetings with a specially convened Biological Registration Special Interest Group , (SIG), formed between several major pharmaceutical companies and Accelrys. The objective of this forum was to understand some of the critical market and product requirements needed in order to build a state-of-the-art Biologics Registration system.
The success of the SIG can be attributed to the customer members being very open towards one another, in spite of being competitors, and the tremendous diligence each company put into specifying user requirements. This open and collaborative approach to software development has become an innovative way to introduce first of a kind technology into the market.
First of a kind software is usually developed as a bespoke project for a single company and then modified over time to meet the needs of the wider market. This can create disadvantages for early adopters as the product functionality evolves and improves with subsequent releases. This situation can be avoided by getting a wider set of requirements through a collaborative SIG formed of a diverse and representative sample of interested parties.
The ability to capture and prioritize a wider set of requirements through leading companies discussing and debating the relative merits and benefits of proposed features, is a more efficient and effective way of understanding market requirements than more traditional methods. The approach also enables the development team to capture feedback and more rapidly create a product that should be attractive to the wider market. The anticipated result is the timely delivery of a product that is well positioned to capture both broad interest and market share.
Have you innovated through collaborative work groups? If so, we would welcome the chance to learn from your experience.