Identification of protein-coding genes putatively involved in infection by combining metagenomics analysis and protein orthologue clustering.

Contributors

Christine Sambles and David Studholme. University of Exeter, Devon.

Introduction

In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus in planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus.

Material

Transcriptome assemblies:

F. excelsior: ATU1

C. fraxinea:  KW1

Mixed material: AT1AT2UptonHolt

Output from BLASTX searches against GenBank:

F. excelsior: ATU1

C. fraxinea: KW1

Mixed material: AT1AT2UptonHolt

Methods & Results

We used MEGAN as previously described (http://oadb.tsl.ac.uk/?p=704), to assign transcripts to taxonomic bins. These transcripts came from four transcript assemblies:

  • 1 H. pseudoalbidus isolate (KW1) and
  • 4 mixed material (AT1, AT2, Holt & Upton).

This resulted in 36,945 transcripts being allocated to the bin for order Helotiales.

The longest open reading frame for each Helotiales-binned transcript (Table 1) was translated into a predicted protein sequence. These protein sequences were clustered using OrthoMCL.

Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate that were binned to the order Helotiales using MEGAN.


AT1

AT2

Holt

Upton

KW1

ATU1

Helotiales

8,214

7,403

6,930

7,410

6,561

0

% all transcripts

15.61%

8.80%

6.44%

12.25%

31.75%

0.00%

OrthoMCL analysis

Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure 1.

Description: \\isad.isadroot.ex.ac.uk\uoe\user\desktop\heloKW1_othomcl_venn\venn_result17167.png

Fig 1: Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H. pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus infected F. excelsior (AT1, AT2, Holt and Upton).

There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.

The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms).  We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts.

GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008), PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) (Park, Karpinets et al. 2010).

GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2).

Figure 2: Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotiales-binned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.

PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families:  

  • Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704)
  • Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding dehydrogenase (Pfam: ADH_zinc_N, PF00107)
  • alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859)
  • Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) &  PAP2 superfamily (Pfam: PAP2_3, PF14378)
  • Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415)
  • Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431)
  • Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase (Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like.

BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone (de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003).

Comparisons of Pfam domain content among samples

PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.

Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group (http://pfam.sanger.ac.uk/).


Domain/Family

Name

Pfam accession

ATP12

ATP12 chaperone protein

PF07542

BOP1NT

BOP1NT (NUC169) domain

PF08145

iPGM_N

BPG-independent PGAM N-terminus

PF06415

CDC37_M

Cdc37 Hsp90 binding domain

PF08565

CDC37_N

Cdc37 N terminal kinase binding domain

PF03234

CDC37_C

Cdc37 C terminal domain

PF08564

Chalcone

Chalcone-flavanone isomerase

PF02431

Copper-bind

Copper binding proteins  plastocyanin/azurin family

PF00127

Sdh5

Flavinator of succinate dehydrogenase

PF03937

HD_3

HD domain

PF13023

Hpt

Hpt domain

PF01627

Metalloenzyme

Metalloenzyme superfamily

PF01676

CENP-I

Mis6

PF07778

Myosin_tail_1

Myosin tail

PF01576

TRM

N2 N2-dimethylguanosine tRNA methyltransferase

PF02005

Es2

Nuclear protein Es2

PF09751

Tom37

Outer mitochondrial membrane transport complex protein

PF10568

PAP2_3

PAP2 superfamily

PF14378

PMC2NT

PMC2NT (NUC016) domain

PF08066

Porphobil_deam

Porphobilinogen deaminase  dipyromethane cofactor binding domain

PF01379

Porphobil_deam(C)

Porphobilinogen deaminase C-terminal domain  

PF03900

DUF2012

Protein of unknown function

PF09430

DUF775

Protein of unknown function

PF05603

Prp31_C

Prp31 C terminal domain

PF09785

Ribosomal_L32p

Ribosomal L32p protein family

PF01783

Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’:

Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) (Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and confines pathogen to site of attempted infection (Mur, Carver et al. 2006).

Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copper-bind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2 protein with them, but the assembled transcript was incomplete at the 5’ end and the PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence similarity to cupredoxin from Glarea        lozoyensis and HHPred predicts similarity to cucumber stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccase-mediated detoxification of phytoalexins generated by the plant defence systems has been observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian, Rajaei et al. 1998; Breuil, Jeandet et al. 1999).

The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton & Holt.  The histidine-containing phosphotransfer (HPt) domain is a novel protein module with an active histidine residue that mediates phosphotransfer reactions in the two-component signalling systems (Catlett, Yoder et al. 2003).

Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1:

ClusterID:         Clustered protein present in:

HELO2454:         AT1, AT2, HOLT, UPTON

HELO4337:         AT1, AT2, HOLT, UPTON, KW1

HELO5213:         AT1, HOLT, UPTON, KW1

HELO5952:         AT2, UPTON, KW1

 

Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton).

The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2).

Table 2: Pfam domains/families identified in the ex planta group


Domain/Family

Name

Pfam accession

COX1

Cytochrome C and Quinol oxidase polypeptide I

PF00115

DASH_Spc34

DASH complex subunit Spc34

PF08657

Pentapeptide_4

Pentapeptide repeats

PF13599

Vac7

Vacuolar segregation subunit 7 P

PF12751

DHQ_synthase

3-dehydroquinate synthase

PF01761

LtrA

Bacterial low temperature requirement A protein

PF06772

FSH1

Serine hydrolase

PF03959

Tyrosinase

Common central domain of tyrosinase

PF00264

Glyco_hydro_47

Glycosyl hydrolase family 47

PF01532

DUF202

Domain of unknown function

PF02656

SET

SET domain

PF00856

Abhydrolase_1

alpha/beta hydrolase fold

PF00561

adh_short_C2

Enoyl-(Acyl carrier protein) reductase

PF13561

Glyco_hydro_3

Glycosyl hydrolase family 3 N terminal domain

PF00933

ADH_zinc_N

Zinc-binding dehydrogenase

PF00107

AAA

ATPase family associated with various cellular activities

PF00004

adh_short

short chain dehydrogenase

PF00106

This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis.

Conclusions

Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.

References

Adrian, M., H. Rajaei, et al. (1998). "Resveratrol Oxidation in Botrytis cinerea Conidia." Phytopathology 88: 472-476.

Breuil, A. C., P. Jeandet, et al. (1999). "Characterization of a Pterostilbene Dehydrodimer Produced by Laccase of Botrytis cinerea." Phytopathology 89: 298-302.

Catlett, N. L., O. C. Yoder, et al. (2003). "Whole-genome analysis of two-component signal transduction genes in fungal pathogens." Eukaryotic cell 2: 1151-1161.

de Vries, R. P. and J. Visser (2001). "Aspergillus Enzymes Involved in Degradation of Plant  Cell Wall Polysaccharides." Microbiology and Molecular Biology Reviews 65: 497-522.

Eggert, C., U. Temp, et al. (1997). "Laccase is essential for lignin degradation by the white-rot fungus Pycnoporus cinnabarinus." FEBS Letters 407: 89-92.

Kulkarni, R. D., H. S. Kelkar, et al. (2003). An eight-cysteine-containing CFEM domain unique to a group of fungal membrane proteins. Trends in Biochemical Sciences. 28: 118-121.

Mur, L. A. J., T. L. W. Carver, et al. (2006). "NO way to live; the various roles of nitric oxide in plant-pathogen interactions." Journal of experimental botany 57: 489-505.

Park, B. H., T. V. Karpinets, et al. (2010). "CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database." Glycobiology 20: 1574-1584.

Pezet, R., V. Pont, et al. (1991). "Evidence for oxidative detoxication of pterostilbene and resveratrol by a laccase-like stilbene oxidase produced by Botrytis cinerea." Physiological and Molecular Plant Pathology 39: 441-450.

Sbaghi, M., P. Jeandet, et al. (1996). "Degradation of stilbene‐type phytoalexins in relation to the pathogenicity of Botrytis cinerea to grapevines." Plant Pathology: 139-144.

Schmid, R. and M. L. Blaxter (2008). "annot8r: GO, EC and KEGG annotation of EST datasets." BMC bioinformatics 9: 180.

Tuor, U., K. Winterhalter, et al. (1995). Enzymes of white-rot fungi involved in lignin degradation and ecological determinants for wood decay. Journal of Biotechnology. 41: 1-17.

Zhou, S., T. Narukami, et al. (2012). Heme-Biosynthetic Porphobilinogen Deaminase Protects Aspergillus nidulans from Nitrosative Stress. Applied and Environmental Microbiology. 78: 103-109.

Orthologue_clustering_v3