Monthly Archives: April 2014

The mitochondrial genome of H. pseudoalbidus

The Contributors

Rachel Glover, FERA.

The material

In order to identify sequences potentially originating from the
mitochondrial genome of H. pseudoalbidus we downloaded the 248
fully sequenced ascomycete mitochondrial genomes from
Genbank and used these sequences as a BLAST database to screen the
genomic contigs for potential mitochondrial origin.

The result

Fifty-seven contigs
were identified with significant similarity to ascomycete mitochondrial
sequences. Further examination of these 57 contigs showed that many
contigs were identical but in reverse complement or extending by a few
hundred base pairs. These contigs were collapsed to form a dataset of 45
contigs ranging in length from 109-14,731bp and GC-contents ranging from
9.2-45.9 % (Figure 1). Most of the contigs \textgreater{}5kb fall into a GC content
range of 30-40 %, typical of AT-rich mitochondrial sequences. It may be
that the AT rich repeat islands discussed above are mitochondrial in
origin as the mitochondrial genome will be more prevalent in the
sequence dataset this would explain the increase in abundance of those

Figure 1. Contigs identified as potentially mitochondrial in origin, by similarity search. A plot of length vs GC content.

The total length of the 45 mitochondrial contigs is
156,026bp with no significant overlap. If this preliminary estimate is accurate \emph{H.pseudoalbidus} would have the largest
mitochondrial genome sequenced from the ascomycetes so far (see Figure 2), although we expect the size to reduce with further work.

Figure 2. Histogram of mitochondrial config length for all sequenced ascomycete mitochondrial genomes.


A number of factors have prevented the construction of a finished
mitochondrial genome at this time. Firstly, the potential mitochondrial
contigs were identified based upon similarity based searches against
current ascomycete mitochondrial genomes. The similarity based approach
to finding mitochondrial sequences within a nuclear genome sequencing
project may have misidentified some of these contigs as mitochondrial
when in fact they are nuclear integrations of portions of the true
mitochondrial genome (NUMTs). This is likely to have artificially
inflated our estimate of the size of the H. pseudoalbidus mitochondrial
genome. Annotation of the potential mitochondrial contigs is in progress
and there are early indications of a very large number of introns
(intronic ORFs) present in the mitochondrial genome of H. pseudoalbidus.
The second complicating factor in attempting to assemble the
mitochondrial genome at this time is the large number of AT repeats
present in the sequences we have identified as being mitochondrial in
origin. The repeats are likely to be collapsed and appear to be at the
ends of the contigs we have identified, preventing further assembly
without additional sequencing.

FIR analysis: genes encoding predicted secreted proteins occur in both gene sparse and gene dense regions of the H. pseudoalbidus genome

The contributors

Daniel Bunting (Nuffield student), Kentaro Yoshida, Dan MacLean and Diane Saunders at TSL.

The material

We used the potential Hymenoscyphus pseudoalbidus KW1 effector candidates identified in (

Background information

In filamentous plant pathogens such as the late blight oomycete pathogen Phytophthora infestans, a repeat-driven expansion has created repeat and transposable element (TE) rich, gene-sparse regions that are distinct from the gene-dense conserved regions, known as a two-speed genome architecture. Determining the distance of a gene to its closest coding gene neighbours, (designated flanking intergenic regions, FIRs), can be used to determine whether a gene resides in a gene-dense or gene-sparse environment. Given that genes associated with pathogenicity tend to have long FIRs in pathogen genomes, genome architecture could be used to identify new candidate pathogenicity genes.

The analysis

To investigate whether a similar organisation occurs in the genome of H. pseudoalbidus we firstly identified candidate effector genes in the gene annotations  ( In order to determine whether genes encoding secreted proteins are in gene sparse or dense regions of the genome we modified the de novo gene calls using RNA-seq data to extend based on overlaps with transcripts, to create the file extended_genes.gff by aligning the RNAseq reads from KW1 against the KW1 assembly, using BWA. For each gene model in the TGAC gene predictions that was within 100nt of another gene we extracted reads on the same strand that fell within -1000nt of the start or 1000nt of the end. With these reads, starting with the start and end of the gene we followed read overlaps as far as possible, until reads no longer overlapped. The most distal read then counted as the new gene start/end.

The FIR distribution for genes in the H.pseudoalbidus genome can be seen below and is indicative of a single speed genome, with genes encoding secreted proteins dispersed both in gene-sparse and gene-dense regions of the genome.



Figure. The single speed H.pseudoalbidus genome. Distribution of H.pseudoalbidus genes according to the length of their 5′ and 3′ flanking intergenic regions (FIRs). Red circles, core genes; blue circles, genes encoding predicted secreted proteins.

Identification of protein-coding genes putatively involved in infection by combining metagenomics analysis and protein orthologue clustering.


Christine Sambles and David Studholme. University of Exeter, Devon.


In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus in planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus.


Transcriptome assemblies:

F. excelsior: ATU1

C. fraxinea:  KW1

Mixed material: AT1AT2UptonHolt

Output from BLASTX searches against GenBank:

F. excelsior: ATU1

C. fraxinea: KW1

Mixed material: AT1AT2UptonHolt

Methods & Results

We used MEGAN as previously described (, to assign transcripts to taxonomic bins. These transcripts came from four transcript assemblies:

  • 1 H. pseudoalbidus isolate (KW1) and
  • 4 mixed material (AT1, AT2, Holt & Upton).

This resulted in 36,945 transcripts being allocated to the bin for order Helotiales.

The longest open reading frame for each Helotiales-binned transcript (Table 1) was translated into a predicted protein sequence. These protein sequences were clustered using OrthoMCL.

Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate that were binned to the order Helotiales using MEGAN.














% all transcripts







OrthoMCL analysis

Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure 1.

Description: \\\uoe\user\desktop\heloKW1_othomcl_venn\venn_result17167.png

Fig 1: Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H. pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus infected F. excelsior (AT1, AT2, Holt and Upton).

There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.

The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms).  We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts.

GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008), PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) (Park, Karpinets et al. 2010).

GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2).

Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotiales-binned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.

PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families:  

  • Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704)
  • Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding dehydrogenase (Pfam: ADH_zinc_N, PF00107)
  • alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859)
  • Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) &  PAP2 superfamily (Pfam: PAP2_3, PF14378)
  • Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415)
  • Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431)
  • Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase (Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like.

BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003).

Comparisons of Pfam domain content among samples

PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.

Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group (



Pfam accession


ATP12 chaperone protein



BOP1NT (NUC169) domain



BPG-independent PGAM N-terminus



Cdc37 Hsp90 binding domain



Cdc37 N terminal kinase binding domain



Cdc37 C terminal domain



Chalcone-flavanone isomerase



Copper binding proteins  plastocyanin/azurin family



Flavinator of succinate dehydrogenase



HD domain



Hpt domain



Metalloenzyme superfamily






Myosin tail



N2 N2-dimethylguanosine tRNA methyltransferase



Nuclear protein Es2



Outer mitochondrial membrane transport complex protein



PAP2 superfamily



PMC2NT (NUC016) domain



Porphobilinogen deaminase  dipyromethane cofactor binding domain



Porphobilinogen deaminase C-terminal domain  



Protein of unknown function



Protein of unknown function



Prp31 C terminal domain



Ribosomal L32p protein family


Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’:

Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and confines pathogen to site of attempted infection (Mur, Carver et al. 2006).

Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copper-bind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2 protein with them, but the assembled transcript was incomplete at the 5’ end and the PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence similarity to cupredoxin from Glarea        lozoyensis and HHPred predicts similarity to cucumber stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccase-mediated detoxification of phytoalexins generated by the plant defence systems has been observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian, Rajaei et al. 1998; Breuil, Jeandet et al. 1999).

The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton & Holt.  The histidine-containing phosphotransfer (HPt) domain is a novel protein module with an active histidine residue that mediates phosphotransfer reactions in the two-component signalling systems (Catlett, Yoder et al. 2003).

Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1:

ClusterID:         Clustered protein present in:

HELO2454:         AT1, AT2, HOLT, UPTON

HELO4337:         AT1, AT2, HOLT, UPTON, KW1

HELO5213:         AT1, HOLT, UPTON, KW1

HELO5952:         AT2, UPTON, KW1


Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton).

The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2).

Table 2: Pfam domains/families identified in the ex planta group



Pfam accession


Cytochrome C and Quinol oxidase polypeptide I



DASH complex subunit Spc34



Pentapeptide repeats



Vacuolar segregation subunit 7 P



3-dehydroquinate synthase



Bacterial low temperature requirement A protein



Serine hydrolase



Common central domain of tyrosinase



Glycosyl hydrolase family 47



Domain of unknown function



SET domain



alpha/beta hydrolase fold



Enoyl-(Acyl carrier protein) reductase



Glycosyl hydrolase family 3 N terminal domain



Zinc-binding dehydrogenase



ATPase family associated with various cellular activities



short chain dehydrogenase


This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis.


Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.


Adrian, M., H. Rajaei, et al. (1998). "Resveratrol Oxidation in Botrytis cinerea Conidia." Phytopathology 88: 472-476.

Breuil, A. C., P. Jeandet, et al. (1999). "Characterization of a Pterostilbene Dehydrodimer Produced by Laccase of Botrytis cinerea." Phytopathology 89: 298-302.

Catlett, N. L., O. C. Yoder, et al. (2003). "Whole-genome analysis of two-component signal transduction genes in fungal pathogens." Eukaryotic cell 2: 1151-1161.

de Vries, R. P. and J. Visser (2001). "Aspergillus Enzymes Involved in Degradation of Plant  Cell Wall Polysaccharides." Microbiology and Molecular Biology Reviews 65: 497-522.

Eggert, C., U. Temp, et al. (1997). "Laccase is essential for lignin degradation by the white-rot fungus Pycnoporus cinnabarinus." FEBS Letters 407: 89-92.

Kulkarni, R. D., H. S. Kelkar, et al. (2003). An eight-cysteine-containing CFEM domain unique to a group of fungal membrane proteins. Trends in Biochemical Sciences. 28: 118-121.

Mur, L. A. J., T. L. W. Carver, et al. (2006). "NO way to live; the various roles of nitric oxide in plant-pathogen interactions." Journal of experimental botany 57: 489-505.

Park, B. H., T. V. Karpinets, et al. (2010). "CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database." Glycobiology 20: 1574-1584.

Pezet, R., V. Pont, et al. (1991). "Evidence for oxidative detoxication of pterostilbene and resveratrol by a laccase-like stilbene oxidase produced by Botrytis cinerea." Physiological and Molecular Plant Pathology 39: 441-450.

Sbaghi, M., P. Jeandet, et al. (1996). "Degradation of stilbene‐type phytoalexins in relation to the pathogenicity of Botrytis cinerea to grapevines." Plant Pathology: 139-144.

Schmid, R. and M. L. Blaxter (2008). "annot8r: GO, EC and KEGG annotation of EST datasets." BMC bioinformatics 9: 180.

Tuor, U., K. Winterhalter, et al. (1995). Enzymes of white-rot fungi involved in lignin degradation and ecological determinants for wood decay. Journal of Biotechnology. 41: 1-17.

Zhou, S., T. Narukami, et al. (2012). Heme-Biosynthetic Porphobilinogen Deaminase Protects Aspergillus nidulans from Nitrosative Stress. Applied and Environmental Microbiology. 78: 103-109.