Uncategorized

Identification of protein-coding genes putatively involved in infection by combining metagenomics analysis and protein orthologue clustering.

Contributors

Christine Sambles and David Studholme. University of Exeter, Devon.

Introduction

In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus in planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus.

Material

Transcriptome assemblies:

F. excelsior: ATU1

C. fraxinea:  KW1

Mixed material: AT1AT2UptonHolt

Output from BLASTX searches against GenBank:

F. excelsior: ATU1

C. fraxinea: KW1

Mixed material: AT1AT2UptonHolt

Methods & Results

We used MEGAN as previously described (http://oadb.tsl.ac.uk/?p=704), to assign transcripts to taxonomic bins. These transcripts came from four transcript assemblies:

  • 1 H. pseudoalbidus isolate (KW1) and
  • 4 mixed material (AT1, AT2, Holt & Upton).

This resulted in 36,945 transcripts being allocated to the bin for order Helotiales.

The longest open reading frame for each Helotiales-binned transcript (Table 1) was translated into a predicted protein sequence. These protein sequences were clustered using OrthoMCL.

Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate that were binned to the order Helotiales using MEGAN.


AT1

AT2

Holt

Upton

KW1

ATU1

Helotiales

8,214

7,403

6,930

7,410

6,561

0

% all transcripts

15.61%

8.80%

6.44%

12.25%

31.75%

0.00%

OrthoMCL analysis

Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure 1.

Description: \\isad.isadroot.ex.ac.uk\uoe\user\desktop\heloKW1_othomcl_venn\venn_result17167.png

Fig 1: Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H. pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus infected F. excelsior (AT1, AT2, Holt and Upton).

There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.

The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms).  We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts.

GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008), PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) (Park, Karpinets et al. 2010).

GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2).

Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotiales-binned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.

PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families:  

  • Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704)
  • Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding dehydrogenase (Pfam: ADH_zinc_N, PF00107)
  • alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859)
  • Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) &  PAP2 superfamily (Pfam: PAP2_3, PF14378)
  • Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415)
  • Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431)
  • Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase (Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like.

BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003).

Comparisons of Pfam domain content among samples

PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.

Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group (http://pfam.sanger.ac.uk/).


Domain/Family

Name

Pfam accession

ATP12

ATP12 chaperone protein

PF07542

BOP1NT

BOP1NT (NUC169) domain

PF08145

iPGM_N

BPG-independent PGAM N-terminus

PF06415

CDC37_M

Cdc37 Hsp90 binding domain

PF08565

CDC37_N

Cdc37 N terminal kinase binding domain

PF03234

CDC37_C

Cdc37 C terminal domain

PF08564

Chalcone

Chalcone-flavanone isomerase

PF02431

Copper-bind

Copper binding proteins  plastocyanin/azurin family

PF00127

Sdh5

Flavinator of succinate dehydrogenase

PF03937

HD_3

HD domain

PF13023

Hpt

Hpt domain

PF01627

Metalloenzyme

Metalloenzyme superfamily

PF01676

CENP-I

Mis6

PF07778

Myosin_tail_1

Myosin tail

PF01576

TRM

N2 N2-dimethylguanosine tRNA methyltransferase

PF02005

Es2

Nuclear protein Es2

PF09751

Tom37

Outer mitochondrial membrane transport complex protein

PF10568

PAP2_3

PAP2 superfamily

PF14378

PMC2NT

PMC2NT (NUC016) domain

PF08066

Porphobil_deam

Porphobilinogen deaminase  dipyromethane cofactor binding domain

PF01379

Porphobil_deam(C)

Porphobilinogen deaminase C-terminal domain  

PF03900

DUF2012

Protein of unknown function

PF09430

DUF775

Protein of unknown function

PF05603

Prp31_C

Prp31 C terminal domain

PF09785

Ribosomal_L32p

Ribosomal L32p protein family

PF01783

Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’:

Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and confines pathogen to site of attempted infection (Mur, Carver et al. 2006).

Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copper-bind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2 protein with them, but the assembled transcript was incomplete at the 5’ end and the PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence similarity to cupredoxin from Glarea        lozoyensis and HHPred predicts similarity to cucumber stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccase-mediated detoxification of phytoalexins generated by the plant defence systems has been observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian, Rajaei et al. 1998; Breuil, Jeandet et al. 1999).

The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton & Holt.  The histidine-containing phosphotransfer (HPt) domain is a novel protein module with an active histidine residue that mediates phosphotransfer reactions in the two-component signalling systems (Catlett, Yoder et al. 2003).

Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1:

ClusterID:         Clustered protein present in:

HELO2454:         AT1, AT2, HOLT, UPTON

HELO4337:         AT1, AT2, HOLT, UPTON, KW1

HELO5213:         AT1, HOLT, UPTON, KW1

HELO5952:         AT2, UPTON, KW1

 

Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton).

The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2).

Table 2: Pfam domains/families identified in the ex planta group


Domain/Family

Name

Pfam accession

COX1

Cytochrome C and Quinol oxidase polypeptide I

PF00115

DASH_Spc34

DASH complex subunit Spc34

PF08657

Pentapeptide_4

Pentapeptide repeats

PF13599

Vac7

Vacuolar segregation subunit 7 P

PF12751

DHQ_synthase

3-dehydroquinate synthase

PF01761

LtrA

Bacterial low temperature requirement A protein

PF06772

FSH1

Serine hydrolase

PF03959

Tyrosinase

Common central domain of tyrosinase

PF00264

Glyco_hydro_47

Glycosyl hydrolase family 47

PF01532

DUF202

Domain of unknown function

PF02656

SET

SET domain

PF00856

Abhydrolase_1

alpha/beta hydrolase fold

PF00561

adh_short_C2

Enoyl-(Acyl carrier protein) reductase

PF13561

Glyco_hydro_3

Glycosyl hydrolase family 3 N terminal domain

PF00933

ADH_zinc_N

Zinc-binding dehydrogenase

PF00107

AAA

ATPase family associated with various cellular activities

PF00004

adh_short

short chain dehydrogenase

PF00106

This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis.

Conclusions

Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.

References

Adrian, M., H. Rajaei, et al. (1998). "Resveratrol Oxidation in Botrytis cinerea Conidia." Phytopathology 88: 472-476.

Breuil, A. C., P. Jeandet, et al. (1999). "Characterization of a Pterostilbene Dehydrodimer Produced by Laccase of Botrytis cinerea." Phytopathology 89: 298-302.

Catlett, N. L., O. C. Yoder, et al. (2003). "Whole-genome analysis of two-component signal transduction genes in fungal pathogens." Eukaryotic cell 2: 1151-1161.

de Vries, R. P. and J. Visser (2001). "Aspergillus Enzymes Involved in Degradation of Plant  Cell Wall Polysaccharides." Microbiology and Molecular Biology Reviews 65: 497-522.

Eggert, C., U. Temp, et al. (1997). "Laccase is essential for lignin degradation by the white-rot fungus Pycnoporus cinnabarinus." FEBS Letters 407: 89-92.

Kulkarni, R. D., H. S. Kelkar, et al. (2003). An eight-cysteine-containing CFEM domain unique to a group of fungal membrane proteins. Trends in Biochemical Sciences. 28: 118-121.

Mur, L. A. J., T. L. W. Carver, et al. (2006). "NO way to live; the various roles of nitric oxide in plant-pathogen interactions." Journal of experimental botany 57: 489-505.

Park, B. H., T. V. Karpinets, et al. (2010). "CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database." Glycobiology 20: 1574-1584.

Pezet, R., V. Pont, et al. (1991). "Evidence for oxidative detoxication of pterostilbene and resveratrol by a laccase-like stilbene oxidase produced by Botrytis cinerea." Physiological and Molecular Plant Pathology 39: 441-450.

Sbaghi, M., P. Jeandet, et al. (1996). "Degradation of stilbene‐type phytoalexins in relation to the pathogenicity of Botrytis cinerea to grapevines." Plant Pathology: 139-144.

Schmid, R. and M. L. Blaxter (2008). "annot8r: GO, EC and KEGG annotation of EST datasets." BMC bioinformatics 9: 180.

Tuor, U., K. Winterhalter, et al. (1995). Enzymes of white-rot fungi involved in lignin degradation and ecological determinants for wood decay. Journal of Biotechnology. 41: 1-17.

Zhou, S., T. Narukami, et al. (2012). Heme-Biosynthetic Porphobilinogen Deaminase Protects Aspergillus nidulans from Nitrosative Stress. Applied and Environmental Microbiology. 78: 103-109.

Orthologue_clustering_v3

Regression analysis of gene expression values against disease symptoms

The contributors

Andrea Harper (CNAP, University of York) and Ian Bancroft (CNAP, University of York)

The analysis

The analysis used computed RPKM transcript abundance data for 130,978 principal isoforms in leaves, representing each of 184 trees from across Denmark and phenotyped for disease symptoms by our colleague Erik Dahl Kjaer’s group. A regression analysis was conducted, by which correlations between gene expression levels and dieback susceptibility were identified. Those with the most significant correlations, P<10-8, are listed.

Potential inhibition of a putative alternative oxidase identified in C. fraxinea reduces fungal growth in culture

The contributors

Mary Albury, Luke Young, Julia Shearman, Ben May and Tony Moore (University of Sussex); Kentaro Yoshida and Diane Saunders (The Sainsbury Laboratory)

The material

The Chalara fraxinea isolate KW1 was tested for growth in the presence/absence of two compounds; a traditional fungicide (compound A) or an inhibitor of the putative alternative oxidase protein identified in the C. fraxinea proteome (http://oadb.tsl.ac.uk/?p=526) (compound C).

The analysis

Agar plugs of C. fraxinea KW1 isolate (5 mm diameter) were used to inoculate glucose agar plates or potato dextrose broth, supplemented with 1, 1.5 or 2 uM of compounds A or C (Figure 1). DMSO was used as a negative control. The presence of compound A reduced C. fraxinea KW1 growth.

To further access the inhibition activity of the two compounds, agar plugs of C. fraxinea KW1 isolate (5 mm diameter) were used to inoculate glucose agar plates supplemented with 1.5 uM of both compounds A and C or DMSO. Agar plugs in potato dextrose broth were supplemented with 1.5 uM of compound A, C, both combined or DMSO (Figure 2). When both compounds were combined there was a significant reduction in C. fraxinea growth in culture.

Three biological and three technical replicates were undertaken for each experiment.

trials

Figure 1. Compound A reduced C. fraxinea KW1 isolate growth in culture. Agar plugs of C. fraxinea KW1 isolate (5 mm diameter) were used to inoculate glucose agar plates or potato dextrose broth, supplemented with 1, 1.5 or 2 uM of compounds A or C or DMSO. Pictures captured 12 days post-inoculation.

Fungicide_exp1

 

Figure 2. Combining compounds A and C significantly reduced C. fraxinea KW1 growth in culture. Agar plugs of C. fraxinea KW1 isolate (5 mm diameter) were used to inoculate glucose agar plates or potato dextrose broth. The plates were supplemented with 1.5 uM of compounds A and C or DMSO. The liquid cultures were supplemented with 1.5 uM of compound A, C, both combined or DMSO. Pictures captured 15 days post inoculation.

Repeated mRNA-Seq analysis of Tree 35

The contributors

Martin Trick (JIC), Andrea Harper (CNAP, University of York), Leah Clissold (TGAC) and Ian Bancroft (CNAP, University of York)

The material

Young leaf material was harvested from a clone of Tree 35 in Denmark in 2013.

The analysis

mRNA was extracted and a paired-end (but this time not strand-specific) Illumina RNA-Seq library constructed. About 125 million read pairs were obtained from a single HiSeq 2500 lane – the raw data are available from The Sainsbury Laboratory’s FTP server, with details in the github repository here. Trinity was again used to assemble transcripts from the complete set of reads, this time generating 242,115 assemblies, and then RSEM transcript abundance analysis was carried out to select 130,978 principal isoforms which constitute our new reference sequence. 96% of the transcripts were located to scaffolds in the Tree 35 genome assembly developed by TGAC. Candidate open reading frames were extracted and the predicted peptides were queried against the UniProt protein database with BLASTP producing a functional annotation.

We have now sequenced the leaf transcriptomes of 186 trees that have been sampled from across Denmark and phenotyped for disease symptoms by our colleague Erik Dahl Kjaer’s group. SNPs and expression levels with respect to the Tree 35 reference have been calculated and we are about to start on the association work.

Variant analysis of different isolates and fruiting bodies of Chalara fraxinea

Contributors

Christine Sambles, David Studholme. University of Exeter, Devon.

Introduction

To identify the extent of genetic variation in the sequenced samples, we undertook variant analysis from seven isolates (KW1, FERA 105, FERA 232, FERA 233, FERA 88, FERA 93 & FERA 94), two fruiting body samples (MFB1 & PFB1) and four mixed material samples (HP1, UB1, AT1 and AT2).

Materials

Raw reads:

C. fraxinea:  KW1

Fruiting bodies: MFB1, PFB1

Mixed material: AT1, AT2, Upton, Holt

 

Genome assembly

C. fraxinea scaffolds:  KW1

Methods

Qualities of raw reads from the thirteen samples were assessed with FASTQC.   Adapter- and quality- trimming was performed with Trim galore, a wrapper script using FASTQC and cutadapt (phred cutoff:20, error rate:0.1, adapter overlap: 1bp, min. length: 20bp, paired read length cut-off: 35bp). FASTQC was automatically run on all trimmed files to confirm trimmed read quality. Raw and trimmed metrics and GC content of raw reads can be seen in Table 1.

Trimmed reads were aligned to the pre-assembled KW1 genome from OADB using the splice-aware aligner Tophat (Trapnell, et al., 2009). The resulting assembly BAM file was used to create a pileup file using MPILEUP from Samtools.

The variant detection software, VarScan2 pileup2snp (or mpileup2snp) was used ( –p-value 0.05 –min-coverage 10 –output-vcf -min-var-freq 0.95) to call SNPs (Koboldt, et al., 2012). The numbers of SNPs were normalised to the number of bases with >=10X coverage to take into account the different depths of sequencing.

Results

The Fera samples contain fewer sequencing reads compared to the fruiting body samples, mixed material samples and KW1 (Table 1). Sample KW1 covers 32% of the genome positions with a coverage of >=10X compared to between 0.17-0.27% for the Fera samples. The remaining samples range from 13% (AT1) to 32% (Upton) (Table 1). This suggests that limited SNP calling is possible in the FERA samples. The KW1 genome assembly has a size of 65Mb and transcriptome assembly has a size of ~35Mb suggesting that spliced transcripts represent 55% of the genome size.

Sample

Sample type

Raw reads x2

GC%

Filtered x2

% 10X KW1 genome positions covered

KW1

C. fraxinea

41,036,951

49

38,594,279

32%

FERA 105

C. fraxinea

1,442,015

48

1,440,015

0.26%

FERA 88

C. fraxinea

1,731,461

47

1,729,869

0.20%

FERA 93

C. fraxinea

1,355,421

47

1,353,849

0.24%

FERA 94

C. fraxinea

1,642,453

47

1,639,096

0.17%

FERA 232

C. fraxinea

1,919,713

48

1,918,416

0.27%

FERA 233

C. fraxinea

1,838,409

47

1,837,097

0.26%

Mature fruiting body (MFB1)

C. fraxinea

15,033,876

49

14,524,487

27%

Primordial fruiting body (PFB1)

C. fraxinea

7,298,316

49

7,187,891

16%

AT1

Mixed material

31,059,879

48

30,388,033

29%

AT2

Mixed material

23,096,021

44

21,924,824

13%

Holt

Mixed material

36,960,651

49

36.231.834

21%

Upton

Mixed material

43,099,662

48

41,367,071

32%

Table 1:  Read number pre- and post-filtering, G+C content of raw data and percentage of KW1 genome positions covered at a depth of >=10X.

SNP calling using VarScan revealed a high level of genetic variation between the samples (Table 2).  The mature fruiting body (MFB1) and primordial fruiting body (PFB1) samples showed a high level of variation (~16,000 SNPs) when aligned to the KW1 genome. Samples AT1 (~6,000), AT2 (~3,000), Holt (~14,000) and Upton (~12,000) also suggested a high level of genetic variation although this could be due to the alignment of non-Chalara reads present in the mixed sample, which have a high degree of similarity to regions of the C. fraxinea genome. This highlights the importance of the availability of multiple genomes due to such high levels of genetic variation.

Sample Sample type # Unambiguous bases # SNPs VarScan Total

KW11

C. fraxinea

19,974,663

59

NA

FERA 88

C. fraxinea

167,378

40

204

FERA 93

C. fraxinea

127,827

20

FERA 94

C. fraxinea

150,981

47

FERA 105

C. fraxinea

104,667

22

FERA 232

C. fraxinea

169,295

53

FERA 233

C. fraxinea

165,989

35

MFB1

C. fraxinea

16,519,209

11,852

16,396

PFB1

C. fraxinea

9,869,668

9,545

AT12

Mixed material

18,108,777

6,029

NA

AT22

Mixed material

8,016,701

3,138

Holt

Mixed material

12,975,549

13,654

Upton

Mixed material

19,789,944

12,259

Table 2: VarScan SNP calling of C. fraxinea isolates, fruiting body and mixed material samples.  1MAT1-1 mating type; 2MAT1-2 mating type; UAP: unambiguous positions.

We identified respectively 40, 20, 47, 22, 53, 35 single-nucleotide differences (10X coverage, 95% consensus) for samples FERA88, FERA93, FERA94, FERA105, FERA232, FERA233 when compared to the KW1 genome. In total, 204 unique SNPs were identified in the FERA samples when multi-sample calling was used. P-values were calculated using a Fisher’s Exact Test on the read counts supporting reference and variant alleles. Of the 204 SNPs, 165 were located within an annotated gene, 155 of these within annotated exons. These 155 SNPs were distributed across 70 genes which are we are currently annotating (BLAST & PFAM).

One of these genes (CHAFR746836.1.1_0031990) has a sequence that is similar (62% identity) to cerato-platanin from the basidiomycete Trametes versicolor (GenBank: EIW62259.1) and was first identified in the ascomycete, Ceratocystis fimbriata, the causal agent of “canker stain disease” in Platanus x acerifolia in Europe (Pazzagli, et al., 1999; Pazzagli, et al., 2006). It belongs to a family of cerato-platanin phytotoxic proteins which are found in the cell wall of the fungus (Boddi, et al., 2004) and are involved in the host-pathogen interaction (Pazzagli, et al., 1999). Six SNPs were identified in this protein. All the FERA samples except FERA88 and FERA105 differ from KW1 at all six SNPs; FERA 88 and FERA105 are identical to KW1 at all six SNPs (Fig 1). At these six SNPs, the primordial fruiting body sample resembles FERA 105 and FERA 88 and differs from FERA 90, FERA 232, FERA233, FERA 94 and the mature fruiting body sample. These two fruiting body samples originate from different sources. In the mature fruiting body sample, which is likely to contain both dikaryotic hyphae and haploid ascospores, all 6 SNPs appear to be heterokaryotic. In all other samples the SNPs are homokaryotic/homozygous.

preview_1215165

Fig 1: (CLick for larger view) IGV view of putative cerato-platanin gene in C. fraxinea. The six SNPs called by VarScan are shown in the top track. The next four tracks show the homokaryotic/homozygous SNPs which are the same in FERA93, FERA232, FERA233 and FERA94, the sixth track shows possible heterokaryotic SNPs in the mature fruiting body (MFB1) and the last three tracks show that the homokaryotic/homozygous SNPs which resemble the KW1 genome sequence are also present in FERA88, FERA105 and the primordial fruiting body (PFB1).

 

Normalisation of the identified SNPs shows that the highest variation occurs in fruiting body samples, Holt and Upton when compared to the KW1 genome (Fig 1). Further analysis is required to understand the significance of this difference in variation and whether the Upton and Holt variation is due to the presence of other fungi, such as Phytothphora sp. and Togninia sp. in the mixed samples interfering with the SNP calling.

 

preview_1215163

Fig 2: Normalised number of SNPs against bases with >=10X . Click for larger view

References

Boddi, S., et al. (2004) Cerato-platanin protein is located in the cell walls of ascospores, conidia and hyphae of Ceratocystis fimbriata f. sp. platani, FEMS Microbiology Letters, 233, 341-346.

Koboldt, D.C., et al. (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing., Genome research, 22, 568-576.

Pazzagli, L., et al. (1999) Purification, characterization, and amino acid sequence of cerato-platanin, a new phytotoxic protein from Ceratocystis fimbriata f. sp. platani., The Journal of biological chemistry, 274, 24959-24964.

Pazzagli, L., et al. (2006) Cerato-platanin, the first member of a new fungal protein family: cloning, expression, and characterization., Cell biochemistry and biophysics, 44, 512-521.

Trapnell, C., Pachter, L. and Salzberg, S.L. (2009) TopHat: discovering splice junctions with RNA-Seq., Bioinformatics (Oxford, England), 25, 1105-1111.

 

Metagenomic analysis of Fraxinus excelsior, Chalara fraxinea and infected material.

Contributors

Christine Sambles, David Studholme. University of Exeter, Devon.

Introduction

To check their species composition, we performed a metagenomic analysis on several datasets that had been generated from samples described as Fraxinus excelsior, Chalara fraxinea or as mixed infected material. We used MEGAN (v4, MEtaGenome Analyzer (Huson, et al., 2011)) and the assembled transcripts from F. excelsior and C. fraxinea to identify the taxonomic groups to which uninfected sample transcripts are allocated, to use as a reference database for binning. We identified many transcripts specific to the infected samples (not in uninfected Fraxinus) that fell outside of the Chalara and Fraxinus bins; these may indicate additional microbial species present during infection. These might include species acting synergistically with C. fraxinea during infection or opportunists present as a secondary consequence of infection. The uninfected F. excelsior sample identifies species that are part of the ‘normal’ or ‘healthy’ tree microbiota and could therefore be excluded from the list of infection-related species.

Material

Transcriptome assemblies:

F. excelsior: ATU1

C. fraxinea:  KW1

Mixed material: AT1, AT2, Upton, Holt

BLASTX against GenBank:

F. excelsior: ATU1

C. fraxinea: KW1

Mixed material: AT1, AT2, Upton, Holt

Methods

We identified sequence similarity between assembled transcripts and GenBank protein sequences using BLASTX; we used as queries the transcripts from uninfected F. excelsior (ATU1), a C. fraxinea isolate (KW1) and four mixed material samples (AT1, AT2, Upton and Holt). We loaded the output from BLASTX into MEGAN and performed taxonomic binning using a minimum support value of 35, a minimum BLAST score of 50 and only retaining hits whose bit scores lie within 10% of the best score. The analyses were normalised, compared and rendered within MEGAN.

Results

For each of the six samples, we identified the numbers of transcripts assigned to Helotiales (the order in which C. fraxinea is classified) and to Viridiplantae (green plants). The results are summarised in Table 1. Transcripts from the (nominally) C. fraxinea isolate KW1, binned into the classes of Dothideomycetes, Eurotiomycetes, Leotiomycetes and Sordariomycetes, which all reside within the subphylum of Pezizomycotina. This result is consistent with the sequenced sample being pure Chalara. As expected, all of the bin-able transcripts from the F. excelsior ATU1 transcripts fell within the Viridiplantae kingdom, specifically within the group of flowering plants (Magnoliophyta).

Sample

Sample Type

Helotiales

Viridiplantae

Non H/V Percent

Normalised reads

Percent

Normalised reads

Percent

ATU1

F. excelsior

0

0%

79,438

79%

21%

KW1

C. fraxinea

32,350

32%

0

0%

68%

AT1

Mixed material

15,853

16%

47,202

47%

37%

AT2

Mixed material

8,889

8.9%

54,434

54%

37%

Upton

Mixed material

12,619

13%

12,798

13%

74%

Holt

Mixed material

6,588

6.6%

31,741

32%

61%

Table 1: Reads binned to Helotiales and Viridiplantae in normalised comparison for each sample.

In the data from Upton mixed material, 74% of the transcripts were not binned within the Helotiales or Viridiplantae, which is where C. fraxinea and F. excelsior transcripts are expected to fall, based on the results from pure isolate of C. fraxinea and the uninfected F. excelsior. In the Upton data, 34% of the total number of transcripts was assigned to Oomycetes; specifically 33% to Phytophthora spp.. Additionally, 13% are not assigned to any taxon and a further 11% had no significant similarity to proteins in the GenBank database, detectable by BLAST. The presence of Phytophthora spp. might be attributed to cross-lane contamination during sequencing, since the Norwich laboratory handling the Upton data also work with Phytophthora infestans. A similar contamination had also been reported for Fera samples with Maize Chlorotic Mottle Virus (MCMV) and Sugarcane Mosaic Virus (SMV) sequences being present. This is a common problem in Illumina sequencing which has led to the incorporation of a taxonomic binning step into sequencing pipelines including at our own sequencing facility at the University of Exeter. These contamination issues highlight the importance of confirming the taxonomic distribution of sequence data in addition to quality checks before performing any downstream analyses. Once identified, the contaminant reads can be removed from the dataset.

The Holt mixed material sample analysis showed 1.5% transcripts binned to Togninia minima, an ascomycete in the order Calosphaeriales. T. minima is a pathogen of grapevines and Prunus spp., however, the closely related T. fraxinopennsylvanica (anamorph: Phaeoacremonium mortoniae) has been observed in dead vascular tissue of declining ash tree branches (Fraxinus latifolia) in California (Eskalen, et al., 2005a; Eskalen, et al., 2005b). It may be that T. fraxinopennsylvanica is present in the Holt material and that the reads were assigned to the species T. minima because that is the most closely related species for which extensive sequence data is available.

For the nominally pure sample of C. fraxinea isolate KW1, only 32% of reads were assigned to Helotiales. However, 66% of reads were assigned to Fungi with 16% not assigned to a taxa and 17% had no significant similarity to proteins in the GenBank BLAST database. This is likely to be due to insufficient sequence data in the GenBank database from Chalara and closely related species.

 

preview_1215170

Fig 1: Click for larger view. Metagenomic analysis of Fraxinus excelsior, Chalara fraxinea and four infected material samples (AT1, AT2, Upton and Holt).

Further analysis using alignments to the F. excelsior and C. fraxinea genome will help interpret whether or not other taxa such as Togninia sp., indicated to be present by the MEGAN analysis, are present or whether they are mis-assigned reads due to the lack of Chalara- and Fraxinus-related proteins in the database.

References

Eskalen, A., Rooney-Latham, S. and Gubler, W.D. (2005a) First Report of Perithecia of Phaeoacremonium viticola on Grapevine ( Vitis vinifera ) and Ash Tree ( Fraxinus latifolia ) in California, Plant Disease, 89, 686-686.

Eskalen, A., Rooney-Latham, S. and Gubler, W.D. (2005b) Occurrence of Togninia fraxinopennsylvanica on Esca-Diseased Grapevines ( Vitis vinifera ) and Declining Ash Trees ( Fraxinus latifolia ) in California, Plant Disease, 89, 528-528.

Huson, D.H., et al. (2011) Integrative analysis of environmental sequences using MEGAN4., Genome research, 21, 1552-1560.

 

Analysis of the Chalara genome suggests that it is AT repeat rich

The Contributor

Dan MacLean

The Analysis

I analysed the TGAC KW1 assembly of Hymenoscyphus pseudoalbidus / Chalara fraxinea  to display various features of the assembly. Click for larger image

 

preview_1194882

  1. (outer black ring) – scaffolds of length >= 10 kbp
  2.  (blue stacks) – gene models
  3.  (blue heatmap) – gene density in 10 kbp windows
  4.  (line plot) – aligned distance between paired end read for library with 196 bp insert size (< 31 bp  – 5th percentile – rendered in orange, > 351 bp – 95th percentile – rendered in blue)
  5.  (line plot) – aligned distance between paired end read for library with 570 bp insert size (< 265 bp – 5th percentile – rendered in orange, > 2313 bp – 95th percentile – rendered in blue) Thresholds calculated as per (https://github.com/danmaclean/h_pseu_analysis/blob/master/circos/scripts/assess_insert_size_distributions.md)
  6.  (pink line plot) – uniquely mapped read covereage. Maximum plotted = 300, minimum plotted = 80
  7.  (pink line plot) – GC percent. Black line = 50% GC, light grey area = 50% – 30 % GC, dark grey area = > 30% GC.
  8. Maximum plotted = 60% GC, minimum plotted = 20% GC
  9.  (green stacks) – Repeat Masker matches.

 

http://figshare.com/articles/Genome_analysis_of_the_Ash_Dieback_Fungus_suggests_a_repeat_rich_genome/791640

The interpretation

The genome assembly contains numerous low GC, high coverage, repeat rich regions, with low overall gene density. This suggests an assembly with collapsed repeats of a genome that is overall rich in repeats.

Cite this:

http://dx.doi.org/10.6084/m9.figshare.791640

Lignin degrading enzymes in the Chalara genome assembly

The contributors

Dan MacLean

The analysis

To extend the analysis of Chalara’s ability to degrade wood and thus invade a tree directly I examined the lignin-decaying gene complement in the genome. The previous work on identifying decay related enzymes in [Floudas]: http://dx.doi.org/10.1126/science.1221748  was used as a base.

The list of proteins in the TGAC 1.1 KW 1 genome assembly of Chalara fraxinea and the list of wood decaying proteins in Floudas  were used as input to BLAST searches to identify proteins with strong sequence identity in the Chalara (see [protocol]: https://github.com/danmaclean/h_pseu_analysis/blob/master/interesting_wood_gene_analysis.md ).

For each group in the Floudas decay-related list Chalara proteins were assayed to identify proteins in the with >50% sequence identity to the representative protein in at least 10 members. Counts of the number of Chalara proteins passing this threshold were used as the estimate of the number of members of each group in the Chalara genome.

 

Click for larger imagepreview_1194872

The interpretation

The analysis shows that the Chalara genome is in fact very poor in the sorts of enzymes that are typically used by wood rotting fungi to decay the wood.

 

Assessing the origin of the UK Ash dieback pathogen

The contributors

Kentaro Yoshida, Dan MacLean, Daniel Bunting, Diane Saunders at The Sainsbury Laboratory

The analysis

To clarify the origin of Ash-dieback pathogen, we constructed a phylogenetic tree of UK, French, and Japanese isolates. Based on high quality SNPs and alignments of short reads from RNA sequencing of infected Ash tree samples from seven locations across Norfolk and Suffolk of UK and Japanese samples, we reconstructed the consensus sequences of genes for these tested samples. We searched single copy genes in Fusarium graminearum, Sclerotinia sclerotiorum, Botrytis cinera, Hymenoscyphus albidus, and H. pseudoalbidus using OrhoMCL (Li et al. 2003 Genome Res. 13:2178-89.) The 2,964 single copy genes were identified and used for the construction of the tree. We built a maximum likelihood phylogenetic tree based on third codon positions of the genes using RAxML software (Stamatakis et al. 2005 Bioinformatics 21:456-463).

3rd_codon_trees_large

MLtreeB

Maximum likelihood trees based on 3rd codon positions of 2964 genes.
The number on braches shows bootstrap probability.
(A) All tested samples.
(B) The enlarged tree in Hymenoscyphus species.

The interpretation

The tree showed H. albidus and H. pseudoalbidus; were clearly separated. Japanese isolates were separated from a single common ancestor of European isolates, suggesting that Japanese isolates can be ancestral. This observation supports the idea that ash-dieback pathogen was originated from Asian isolates (Queloz et al. 2011 For. Path. 41:133-142, Pautasso et al. 2013 Biological Conservation 158:37-49.)

Acknowledgements

Tsuyoshi Hosoya from National Museum of Nature and Science, Japan kindly provided Japanese samples.
Steve Collin from Norfolk Wildlife Trust kindly collected some of the UK samples.