In the rapidly growing field of genomics, pattern matching against a sequence of nucleotides or amino acids is critical to the assembly, annotation and comparison of complex genomes. This is essential to tackle some of most prevalent diseases in humans, such as cancer, as well as identify and prevent the pathogens that can decimate stable crops that provide global food security.
As gene-sequencing technologies continue to evolve, the public sequencing databases that contain this knowledge are doubling in size every 18 months or less. Identifying and understanding newly sequenced genes through extensive searches of these DNA databases is becoming too expensive for researchers in genome analysis. Such large databases require access to large HPC resources that consume vast amounts of energy for power and cooling.
The new Genetic Search System (GENESYS) will perform large-scale gene database searches powered by just a standard mains supply. Its success will dramatically reduce the capital and energy costs required to undertake this fundamental DNA research. Identifying and understanding newly sequenced genes requires extensive searches of such databases e.g. after discovering a previously unknown gene in a mouse, a scientist will typically search the human genome to see if humans carry a similar gene. DNA searches are also used to understand the evolutionary function and relationship between species and their common ancestors.
To ensure searches complete in a reasonable time, large and costly HPC systems and robust searching software, such as BLAST (one of the most widely used bioinformatics applications) are used. In this collaboration, TGAC will be the first to apply the technology to the field of bioinformatics, where Optalysys's revolutionary optical processing technology will be developed to perform BLAST-like searches against the 64 million+ base pairs of DNA within the Human Microbiome Mock Community database, which have been extensively studied at TGAC.
TGAC’s HPC systems collectively consume up to 130KW of power, including mechanically removing heat. By using GENESYS, TGAC expects to reduce its power consumption by over 95% with Optalysys technology compared to its existing HPC facilities while significantly reducing the environmental impact of running traditional HPC.
"I'm really excited to be working with Optalysys to apply their revolutionary optical computing technology to aligning DNA sequences, a fundamental genomics problem. Their optical computing hardware has the potential to provide a huge step-change in the important areas of high-performance computing and energy efficiency, and I'm delighted that TGAC is leading the vanguard with this technology in the Biosciences. Such a valuable collaboration between industry and academia is only possible through the support of funding bodies such as Innovate UK," said Timothy Stitt, Head of Computing at TGAC.
Optalysys CEO, Dr Nick New, adds: “We are thrilled to be working with the leading research Institute in genomics, TGAC on this exciting project. The GENESYS system has the potential to fundamentally change the field of DNA analysis by providing a system that is at least as accurate as current systems, but magnitudes faster, cheaper and smaller – fitting on a desktop. This brings the ability to perform this kind of analysis into the hands of a much broader base of companies and institutions who previously were unable to do so due to capacity constraints and prohibitive running costs."
Crucial in addressing grand scientific challenges of genome analysis such as high-resolution models in Big Data, GENESYS is not limited to only bio-informatics applications. Once a successful prototype is realised, the pioneering technology can be transformed to be used for other projects relying on Big Data, e.g. real-time processing of satellite data and cancer diagnostics, thereby stimulating companies and institutions to embark on data-driven innovation that they were previously unable to do because of lack of capacity and HPC-associated costs.
Led by Yorkshire-based Optalysys in partnership with TGAC in Norwich, this ground-breaking project’s partnership of hardware engineering and advanced science expertise has the potential of unprecedented scale in environmental, economic and innovation impact. Optalysys is an SME comprised of highly experienced technical engineers and business leaders with the goal of bringing Big Data supercomputing to the world; TGAC is a research institute with around 90 staff, focused on the application of state of the art genomics and bioinformatics to advance plant, animal and microbial research to promote a sustainable bioeconomy.
The global genomics market is predicted to grow at 10.3% CAGR to reach $22.1bn by 2020 (4). With a number of genomics companies, investment and data intensive projects increasing, a 4300% growth in annual data generation by 2020 is predicted (5). With such a considerable increase in data, the problems of energy efficiency, high expectancy and the rising cost of compute resources will only become more pressing, which GENESYS could provide a solution for.
TGAC is strategically funded by BBSRC and operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.