OpenAssembler: assembly of reads from a mix of high-throughput sequencing technologies.
Sébastien Boisvert, François Laviolette, and Jacques Corbeil.
Robert Cedergren Bioinformatics Colloquium 2009 (Université de Montréal). 

OpenAssembler: assembly of reads from a mix of high-throughput sequencing technologies


An accurate and complete genome sequence of a desired species or phylogenetically close relative is now a basic pre-requisite for advanced genomics research. A crucial step in obtaining high-quality genome sequence is the ability to correctly assemble short individual sequence reads into longer contiguous sequences accurately representing genomic regions that are much longer than any single contributing read. Current sequencing technologies continue to offer increases in throughput and corresponding reductions in cost and time. Unfortunately, the benefit of obtaining very large numbers of reads is complicated by a non-trivial presence of sequence errors, with different types of errors and biases being observed with the different sequencing systems. Although software systems exist for assembling reads for each individual system, no comprehensive procedure was proposed for high-quality genome assembly based on mixes of reads from different technologies. We describe an open source software program called OpenAssembler which has been specifically developed to assemble reads obtained from a combination of sequencing systems, and compare its performance to other assembly packages on simulated and real datasets. To illustrate the value of OpenAssembler, we used a combination of Roche/454 and Illumina reads to assemble the 3.6 Mb Acinetobacter baylyi ADP1 genome (NCBI/Genbank accession CR543861) into 119 contigs containing 26 mismatches and 7 indels. The Newbler assembler, using only the Roche/454 reads (reads for which it has been design for), assembled the genome into 118 contigs with 64 mismatches and 356 indels.