TGAC Ash tree assemblies (Tree 18 & 35)

The Contributors

Bernardo Clavijo and Team,
The Genome Analysis Centre (TGAC), Norwich

Assembly summary:

Tree Number of scaffolds Total sequence (Mbp) % of Ns N50 (Kbp)
Tree18 37,452 865.1 5.1 180.4
Tree35 29,847 845.6 1.4 137.6

Contig assembly

Contigs were assembled from 250bp paired-end reads generated from a PCR-free protocol. The DISCOVAR de novo software [1] was used. We used KAT [2] spectra-cn plots to QC motif representation, and tailored our data generation towards a maximum complexity, precisely sized, low bias sampling.

Haplotype filter

Expectation maximisation heuristics based on k-mer spectra of the raw reads were applied to the contigs to create a mosaic genome representation by collapsing the haplotypes into one choice per locus. The filtered set of contigs represents all homozygous content and roughly half of the heterozygous content which simplifies the scaffolding stage.


Nextera LMP were constructed, QC’d, and chosen for sequencing as described in TGAC’s published method [3], and pre-processed with a pipeline based on Nextclip [4]. Haplotype-filtered contigs were scaffolded using SOAPdenovo2 [5]. SOAPdenovo2 replaces N-stretches (gaps) in contigs with Cs and Gs during scaffolding so to correct for this contigs were mapped back to the scaffolds and the gaps converted back to Ns.

Contamination screening and filtering

Scaffolds shorter than 1kbp were removed. The remaining scaffolds were checked for contamination against NCBI’s nucleotide database using BLAST+ and the results joined to NCBI’s taxonomy database. Results were filtered to show hits of >98percent identity over >90% of their length. From this list, scaffolds identified as contamination were removed.

Assemblies are available to download from oadb ftp site
Tree 18 assembly

Tree 35 assembly

Both genomes are available to blast query at
TGAC ash genome blast site

1) http://www.broadinstitute.org/software/discovar/blog/
2) http://www.tgac.ac.uk/KAT/
3) D. Heavens, G. G. Accinelli, B. Clavijo, and M. D. Clark, “A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost.,” BioTechniques, vol. 59, no. 1, pp. 42–45, 2015.
4) R. M. Leggett, B. J. Clavijo, L. Clissold, M. D. Clark, and M. Caccamo, “NextClip: an analysis and read preparation tool for Nextera long mate pair libraries,” Bioinformatics, p. btt702, 2013.
5) R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, J. Tang, G. Wu, H. Zhang, Y. Shi, Y. Liu, C. Yu, B. Wang, Y. Lu, C. Han, D. W. Cheung, S.-M. Yiu, S. Peng, Z. Xiaoqian, G. Liu, X. Liao, Y. Li, H. Yang, J. Wang, T.-W. Lam, and J. Wang, “SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.,” Gigascience, vol. 1, no. 1, p. 18, 2012.

Contact: Bernardo Clavijo, Algorithms Team Leader, TGAC.

Genome sequencing of 9 species from the genus, Hymenoscyphus.

The Contributors

Christine Sambles, Karen Moore, Exeter Sequencing Service, Mick Kershaw, Chris Thornton, Murray Grant and David Studholme.
University of Exeter, Devon.


Genome sequencing and assembly of 9 different species from the Hymenoscyphus genus.

Table 1: The 9 sequenced species with NCBI GenBank and Short Read Archive (SRA) accession numbers:

Species Strain GenBank accession SRA accession
Hymenoscyphus fructigenus CBS 650.92 LKUV00000000 SRX1322313
Hymenoscyphus scutula CBS 480.97 LKTO00000000 SRX1322311
Hymenoscyphus varicosporoides CBS 651.66 LLCF00000000 SRX1322314
Hymenoscyphus repandus CBS 341.76 LLCE00000000 SRX1322310
Hymenoscyphus salicellus CBS 111550 LLCD00000000 SRX1322295
Hymenoscyphus fraxineus CBS 133217 LLCC00000000 SRX1322294
Hymenoscyphus infarciens CBS 122016 LLCB00000000 SRX1322117
Hymenoscyphus herbarum (Calycina herbarum) CBS 466.73 LLEY00000000 SRX1325539
Hymenoscyphus laetus* CBS 340.76 LLCA00000000 SRX1322158

*ITS phylogenetic analysis suggests this is not a Hymenoscyphus spp. but is more likely to be a species from the Phaeosphaeria genus.

Table 2: Assembly statistics of the 9 sequenced Hymenoscyphus spp. genomes:

Species Strain Genome size (bp) # contigs N50 % GC
Hymenoscyphus fructigenus CBS 650.92 61,124,938 504 373,842 43.09
Hymenoscyphus scutula CBS 480.97 63,226,382 2,591 72,955 41.07
Hymenoscyphus varicosporoides CBS 651.66 31,978,795 206 585,492 46.36
Hymenoscyphus repandus CBS 341.76 42,813,648 925 226,427 45.61
Hymenoscyphus salicellus CBS 111550 57,952,665 9,749 9,574 44.34
Hymenoscyphus fraxineus CBS 133217 51,524,987 4,749 24,374 43.65
Hymenoscyphus infarciens CBS 122016 68,152,485 1,714 156,341 37.56
Hymenoscyphus herbarum CBS 466.73 69,308,136 529 325,154 41.67
Hymenoscyphus laetus* CBS 340.76 36,473,783 471 257,815 51.89

Fig 1: ITS phylogenetic tree of 8 sequenced genomes and other member of the Helotiales order, not including H. laetus (putatively a Phaeosphaeria spp.)


Comment: Ash pathogen Hymenoscyphus pseudoalbidus renamed Hymenoscyphus fraxineus

J Allan Downie, John Innes Centre

On this web site (in line with current usage to date) the Chalara ash dieback pathogen has been referred to as Chalara fraxinea (asexual morph) and Hymenoscyphus pseudoalbidus (sexual morph). In line with the recent publication altering the nomenclature of this fungus (Baral et al 2014), it is suggested that on this web site the new name Hymenoscyphus fraxineus should now be used.

Genome sequencing of 23 strains of H. pseudoalbidus from Europe

The Contributors

Georgios Koutsovoulos, Mark Blaxter
The University of Edinburgh

Adam Vivian-Smith & Ari Hietala
Skog og Landskap – Norwegian Forest and Landscape Institute
Bioforsk – Norwegian Institute for Agricultural and Environmental Research

Renaud Ioos & colleagues
Unité de Mycologie, Laboratoire de la Santé des Végétaux, Domaine de Pixérécourt, Malzéville, France


Genome assembly summary of 23 Strains of H. pseudoalbidus sequenced at The University of Edinburgh

Table 1: 23 strains of H. pseudoalbidus from Europe

Strain Year Country
2008-81-6      2008      NORWAY
2008-125/2      2008      NORWAY
2008-139/1      2008      NORWAY
2008-142/5      2008      NORWAY
2008-148/4      2008      NORWAY
2008-152/4      2008      NORWAY
2009-86/3      2009      NORWAY
2010-189/4      2010      NORWAY
2010-189/5      2010      NORWAY
2011-11/1      2011      NORWAY
2012-24/1      2012      NORWAY
2012-38/2/2      2012      NORWAY
2012-42/1/1      2012      NORWAY
CBS122191      ?      AUSTRIA
CBS122503      2011      POLAND
CBS122504      2005      POLAND
CBS122505      2000      POLAND
CBS122507      2000      POLAND
FON-M-1      2009      FRANCE
GIR-M-2      2009      FRANCE
LAN-M-1      2009      FRANCE
MIG-M-1      2009      FRANCE
LSVM82      2008      FRANCE

Table 2: Summary of assembly stats

Strain contigs (>500) span of contigs (MB) N50 GC% diff. from K1 indels from K1 span of indels
2008-125/2 6267 53.5 24303 42.67 193829 37014 75714
2008-139/1 6041 53.5 24414 42.69 174930 32599 68413
2008-142/5 6138 53.5 25254 42.72 180469 32502 69266
2008-148/4 7560 51.8 18776 43.18 175814 32875 68400
2008-152/4 6736 52.8 22327 42.9 176916 34024 70682
2008-81/6 7387 51.6 19538 43.19 187222 36522 72017
2009-86/3 6160 53.5 25295 42.72 164775 32603 64923
2010-189/4 6348 53.3 23729 42.78 179776 33438 70108
2010-189/5 6331 53.4 24880 42.77 179106 33429 69312
2011-11/1 6118 53.4 25094 42.73 188201 37279 75145
2012-24/1 6317 53.3 24080 42.77 177522 33805 69614
2012-38/2/2 6094 53.5 24711 42.67 174888 33466 67026
2012-42/1/1 6516 53.1 23078 42.83 173764 32218 67008
CBS122191 6252 53.5 24172 42.71 181217 32820 69030
CBS122503 6700 53.2 22119 42.77 195226 35872 74082
CBS122504 6365 56.7 30182 41.76 191934 33400 68770
CBS122505 6422 53.5 23708 42.67 172436 32440 66584
CBS122507 6387 53.1 23038 42.79 180121 34599 70054
FON-M-1 6436 53.3 23037 42.73 179944 35230 70160
GIR-M-2 4770 51.9 34647 43.59 196420 37771 95254
LAN-M-1 6534 52.5 23965 43.15 168898 31707 64028
LSVM82 35509 77.3 19479 38.33 171274 34758 92767
MIG-M-1 6498 52.8 23204 42.91 200012 36898 78521

Reports and Raw data

Reports and presentations of complete assemblies are available at OADB github Assembly reports of European strains

And raw data was submitted to The European Nucleotide Archive (ENA) ERP006093
Further information of raw data and assembled genomes are available at Genome assemblies of European strains
and at http://www.ebi.ac.uk/ena/data/view/ERS480843-ERS480865

20 UK isolates sequenced and submitted by The Genome Analysis Centre

The Contributors

Mark McMullan, Matt Clark, Louisa Williamson, James Lipscombe, Rachel Piddock, Fiore Cugliandolo, Fiona Fraser, Tom Barker, Mario Caccamo
The Genome Analysis Centre, Norwich


Hymenoscyphus pseudoalbidus (Chalara fraxinea) were isolated from across Great Britain and sent to TGAC by the The Food and Environment Research Agency (FERA). FERA purified DNA from each isolate which was sent to TGAC, where it was QC’d, libraries constructed and finally sequenced on a HiSeq2500 (150bp) paired end run. The 20 isolates were sequenced over two lanes (20 isolates per lane) to an average of ~50x depth per isolate.
Following are the 20 isolates

and their sequence information is available at OADB github

Downstream analyses (genome assembly, phylogenies and selection analyses) have been done at TGAC (MM).
These analyses await data from European isolates sequenced in Edinburgh.

Analysis of UK Ash diversity set- morphological traits and disease susceptibility

The Contributors

Robert J. Saville, Tom Passey, Judit Linka, Karen Russell and Richard J. Harrison
Genetics and Crop Improvement, East Malling Research


As part of the Nornex consortium EMR has been screening a collection of UK ash clones collected as part of historic DEFRA projects and by members of the Future Trees Trust. A partial analysis of the diversity of UK ash has previously been reported (Sutherland et al. 2010). Throughout the year, trees were evaluated for floral sexual morphology, leaf emergence, senescence and presence of potential Hymenoscyphus pseudoalbidus infection. The ultimate aim of this work is to identify putative resistant trees and ascertain whether previously reported correlations between senescence date and disease tolerance could be observed in UK ash material (Kjaer et al. 2012; McKinney et al. 2011; McKinney et al. 2012).


Ash populations, described in the downloadable spreadsheet are in the most part duplicate populations, planted in two phases in 2008-2009.

Methods and Results

Tree sex was determined using the trait descriptors shown in Figure 1 (below). Data is presented in the supplementary excel file, in the tab labeled Tree Sex. These data are valuable for future breeding and selection of both males and females that display resistance.

Figure 1. Flower types observed on ash (Fraxinus excelsior L.). a) male flower (prior to anthesis), b) hermaphrodite flower with rudimentary gynoecium (functionally male), c) hermaphrodite flower, d) hermaphrodite flower with vestigial anthers (functionally female) and, e) female flower.

Leaf emergence was scored based on the trait descriptors shown in Figure 2 (below). Data is presented in the supplementary excel file, in the tab labeled Leaf Emergence. These data may be useful when identifying traits correlated with local niche (i.e. altitude/ latitude), which may be important for the successful introduction of resistant material in future.

Figure 2. Leaf emergence scored on a five point scale based on level of emergence.

Senescence was recorded using three different descriptors (listed in Table 1-3). These traits were leaf loss, leaf colour and rachis retention, all of which may be significantly related to disease escape. These data are presented in the supplementary excel file, in the tab labeled Senescence.

Table 1: Trait descriptor for leaf loss

Leaf Loss Trait Description
1                   no leaf loss
2                   1-25% leaf loss
3                   26-50% leaf loss
4                   51-75% leaf loss
5                   76-99%% leaf loss
6                   100% leaf loss


Table 2: Trait descriptor for leaf colour scale (adapted from McKinney et al. 2011)

Leaf Colour Trait Description
1          dark green leaves
2          ~25% yellow leaves
3          ~50% yellow leaves
4          ~75% yellow leaves
5          completely yellow and fading leaves
6          necrosis (brown leaves)


Table 3: Trait descriptor for rachis retention (to assess disease escape significance)

Rachis retention scale Description
Y          rachis detach easily when pulled through hand
N          rachis do not detach when pulled through hand

Disease observations

Disease was recorded throughout the season in 2013, at which point (barring a single pre-existing lesion) no symptoms of foliar disease were observed during the growing season. However, in early 2014 disease assessment of dormant trees revealed several accessions with lesions on first year wood (i.e. infection that occurred during 2013) but was not observed from assessments. These results are presented in the Disease tab of the supplementary excel file, as are full diary records of dates of recording in the season diary tab. Subsequent isolation from the leading edge of suspect lesions confirmed the presence of cultures consistent with H. pseudoalbidus. PCR validation is underway, though all hallmarks of both lesions and subsequent cultures and indicative of H. pseudoalbidus being present.

Link to raw data at OADB github supplementary excel file


Kjaer, E.D. et al., 2012. Adaptive potential of ash (Fraxinus excelsior) populations against the novel emerging pathogen Hymenoscyphus pseudoalbidus. Evolutionary Applications, 5(3), pp.219–228.
McKinney, L. V et al., 2011. Presence of natural genetic resistance in Fraxinus excelsior (Oleraceae) to Chalara fraxinea (Ascomycota): an emerging infectious disease. Heredity, 106(5), pp.788–97.
McKinney, L. V. et al., 2012. Genetic resistance to Hymenoscyphus pseudoalbidus limits fungal growth and symptom occurrence in Fraxinus excelsior. Forest Pathology, 42(1), pp.69–74.
Sutherland, B.G. et al., 2010. Molecular biodiversity and population structure in common ash (Fraxinus excelsior L.) in Britain: implications for conservation. Molecular ecology, 19(11), pp.2196–211.

The mitochondrial genome of H. pseudoalbidus

The Contributors

Rachel Glover, FERA.

The material

In order to identify sequences potentially originating from the
mitochondrial genome of H. pseudoalbidus we downloaded the 248
fully sequenced ascomycete mitochondrial genomes from
Genbank and used these sequences as a BLAST database to screen the
genomic contigs for potential mitochondrial origin.

The result

Fifty-seven contigs
were identified with significant similarity to ascomycete mitochondrial
sequences. Further examination of these 57 contigs showed that many
contigs were identical but in reverse complement or extending by a few
hundred base pairs. These contigs were collapsed to form a dataset of 45
contigs ranging in length from 109-14,731bp and GC-contents ranging from
9.2-45.9 % (Figure 1). Most of the contigs \textgreater{}5kb fall into a GC content
range of 30-40 %, typical of AT-rich mitochondrial sequences. It may be
that the AT rich repeat islands discussed above are mitochondrial in
origin as the mitochondrial genome will be more prevalent in the
sequence dataset this would explain the increase in abundance of those

Figure 1. Contigs identified as potentially mitochondrial in origin, by similarity search. A plot of length vs GC content.

The total length of the 45 mitochondrial contigs is
156,026bp with no significant overlap. If this preliminary estimate is accurate \emph{H.pseudoalbidus} would have the largest
mitochondrial genome sequenced from the ascomycetes so far (see Figure 2), although we expect the size to reduce with further work.

Figure 2. Histogram of mitochondrial config length for all sequenced ascomycete mitochondrial genomes.


A number of factors have prevented the construction of a finished
mitochondrial genome at this time. Firstly, the potential mitochondrial
contigs were identified based upon similarity based searches against
current ascomycete mitochondrial genomes. The similarity based approach
to finding mitochondrial sequences within a nuclear genome sequencing
project may have misidentified some of these contigs as mitochondrial
when in fact they are nuclear integrations of portions of the true
mitochondrial genome (NUMTs). This is likely to have artificially
inflated our estimate of the size of the H. pseudoalbidus mitochondrial
genome. Annotation of the potential mitochondrial contigs is in progress
and there are early indications of a very large number of introns
(intronic ORFs) present in the mitochondrial genome of H. pseudoalbidus.
The second complicating factor in attempting to assemble the
mitochondrial genome at this time is the large number of AT repeats
present in the sequences we have identified as being mitochondrial in
origin. The repeats are likely to be collapsed and appear to be at the
ends of the contigs we have identified, preventing further assembly
without additional sequencing.

FIR analysis: genes encoding predicted secreted proteins occur in both gene sparse and gene dense regions of the H. pseudoalbidus genome

The contributors

Daniel Bunting (Nuffield student), Kentaro Yoshida, Dan MacLean and Diane Saunders at TSL.

The material

We used the potential Hymenoscyphus pseudoalbidus KW1 effector candidates identified in (http://oadb.tsl.ac.uk/?m=20130910).

Background information

In filamentous plant pathogens such as the late blight oomycete pathogen Phytophthora infestans, a repeat-driven expansion has created repeat and transposable element (TE) rich, gene-sparse regions that are distinct from the gene-dense conserved regions, known as a two-speed genome architecture. Determining the distance of a gene to its closest coding gene neighbours, (designated flanking intergenic regions, FIRs), can be used to determine whether a gene resides in a gene-dense or gene-sparse environment. Given that genes associated with pathogenicity tend to have long FIRs in pathogen genomes, genome architecture could be used to identify new candidate pathogenicity genes.

The analysis

To investigate whether a similar organisation occurs in the genome of H. pseudoalbidus we firstly identified candidate effector genes in the gene annotations  (http://oadb.tsl.ac.uk/?m=20130910). In order to determine whether genes encoding secreted proteins are in gene sparse or dense regions of the genome we modified the de novo gene calls using RNA-seq data to extend based on overlaps with transcripts, to create the file extended_genes.gff by aligning the RNAseq reads from KW1 against the KW1 assembly, using BWA. For each gene model in the TGAC gene predictions that was within 100nt of another gene we extracted reads on the same strand that fell within -1000nt of the start or 1000nt of the end. With these reads, starting with the start and end of the gene we followed read overlaps as far as possible, until reads no longer overlapped. The most distal read then counted as the new gene start/end.

The FIR distribution for genes in the H.pseudoalbidus genome can be seen below and is indicative of a single speed genome, with genes encoding secreted proteins dispersed both in gene-sparse and gene-dense regions of the genome.



Figure. The single speed H.pseudoalbidus genome. Distribution of H.pseudoalbidus genes according to the length of their 5′ and 3′ flanking intergenic regions (FIRs). Red circles, core genes; blue circles, genes encoding predicted secreted proteins.