Cray User Group Conference 2013

Recent progress in DNA sequencing technology has yielded a new class of devices that allow for the analysis of genetic material with unprecedented speed and efficiency. These advances, styled under the name Next Generation Sequencing (NGS) are well suited for High-Performance Computing (HPC) systems. By breaking up DNA into millions of small strands (20 to 1000 bases) and reading them in parallel, the rate at which genetic material can be acquired has increase by several orders of magnitude. The technology to generate raw genomic data is becoming increasingly fast and inexpensive when compared to the rate that this data can be analyzed. In general, assembling small reads into a useful form is done by either assembling individual reads (de novo) or mapping these pieces against a reference. In this paper we present our experience with these applications on Cray supercomputers. In particular with Ray, a parallel short-read assembler code.