Bernardo Clavijo and Team,
The Genome Analysis Centre (TGAC), Norwich
|Tree||Number of scaffolds||Total sequence (Mbp)||% of Ns||N50 (Kbp)|
Contigs were assembled from 250bp paired-end reads generated from a PCR-free protocol. The DISCOVAR de novo software  was used. We used KAT  spectra-cn plots to QC motif representation, and tailored our data generation towards a maximum complexity, precisely sized, low bias sampling.
Expectation maximisation heuristics based on k-mer spectra of the raw reads were applied to the contigs to create a mosaic genome representation by collapsing the haplotypes into one choice per locus. The filtered set of contigs represents all homozygous content and roughly half of the heterozygous content which simplifies the scaffolding stage.
Nextera LMP were constructed, QC’d, and chosen for sequencing as described in TGAC’s published method , and pre-processed with a pipeline based on Nextclip . Haplotype-filtered contigs were scaffolded using SOAPdenovo2 . SOAPdenovo2 replaces N-stretches (gaps) in contigs with Cs and Gs during scaffolding so to correct for this contigs were mapped back to the scaffolds and the gaps converted back to Ns.
Contamination screening and filtering
Scaffolds shorter than 1kbp were removed. The remaining scaffolds were checked for contamination against NCBI’s nucleotide database using BLAST+ and the results joined to NCBI’s taxonomy database. Results were filtered to show hits of >98percent identity over >90% of their length. From this list, scaffolds identified as contamination were removed.
Both genomes are available to blast query at
TGAC ash genome blast site
3) D. Heavens, G. G. Accinelli, B. Clavijo, and M. D. Clark, “A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost.,” BioTechniques, vol. 59, no. 1, pp. 42–45, 2015.
4) R. M. Leggett, B. J. Clavijo, L. Clissold, M. D. Clark, and M. Caccamo, “NextClip: an analysis and read preparation tool for Nextera long mate pair libraries,” Bioinformatics, p. btt702, 2013.
5) R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, J. Tang, G. Wu, H. Zhang, Y. Shi, Y. Liu, C. Yu, B. Wang, Y. Lu, C. Han, D. W. Cheung, S.-M. Yiu, S. Peng, Z. Xiaoqian, G. Liu, X. Liao, Y. Li, H. Yang, J. Wang, T.-W. Lam, and J. Wang, “SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.,” Gigascience, vol. 1, no. 1, p. 18, 2012.
Contact: Bernardo Clavijo, Algorithms Team Leader, TGAC.