Variant analysis of different isolates and fruiting bodies of Chalara fraxinea

Contributors

Christine Sambles, David Studholme. University of Exeter, Devon.

Introduction

To identify the extent of genetic variation in the sequenced samples, we undertook variant analysis from seven isolates (KW1, FERA 105, FERA 232, FERA 233, FERA 88, FERA 93 & FERA 94), two fruiting body samples (MFB1 & PFB1) and four mixed material samples (HP1, UB1, AT1 and AT2).

Materials

Raw reads:

C. fraxinea:  KW1

Fruiting bodies: MFB1, PFB1

Mixed material: AT1, AT2, Upton, Holt

 

Genome assembly

C. fraxinea scaffolds:  KW1

Methods

Qualities of raw reads from the thirteen samples were assessed with FASTQC.   Adapter- and quality- trimming was performed with Trim galore, a wrapper script using FASTQC and cutadapt (phred cutoff:20, error rate:0.1, adapter overlap: 1bp, min. length: 20bp, paired read length cut-off: 35bp). FASTQC was automatically run on all trimmed files to confirm trimmed read quality. Raw and trimmed metrics and GC content of raw reads can be seen in Table 1.

Trimmed reads were aligned to the pre-assembled KW1 genome from OADB using the splice-aware aligner Tophat (Trapnell, et al., 2009). The resulting assembly BAM file was used to create a pileup file using MPILEUP from Samtools.

The variant detection software, VarScan2 pileup2snp (or mpileup2snp) was used ( –p-value 0.05 –min-coverage 10 –output-vcf -min-var-freq 0.95) to call SNPs (Koboldt, et al., 2012). The numbers of SNPs were normalised to the number of bases with >=10X coverage to take into account the different depths of sequencing.

Results

The Fera samples contain fewer sequencing reads compared to the fruiting body samples, mixed material samples and KW1 (Table 1). Sample KW1 covers 32% of the genome positions with a coverage of >=10X compared to between 0.17-0.27% for the Fera samples. The remaining samples range from 13% (AT1) to 32% (Upton) (Table 1). This suggests that limited SNP calling is possible in the FERA samples. The KW1 genome assembly has a size of 65Mb and transcriptome assembly has a size of ~35Mb suggesting that spliced transcripts represent 55% of the genome size.

Sample

Sample type

Raw reads x2

GC%

Filtered x2

% 10X KW1 genome positions covered

KW1

C. fraxinea

41,036,951

49

38,594,279

32%

FERA 105

C. fraxinea

1,442,015

48

1,440,015

0.26%

FERA 88

C. fraxinea

1,731,461

47

1,729,869

0.20%

FERA 93

C. fraxinea

1,355,421

47

1,353,849

0.24%

FERA 94

C. fraxinea

1,642,453

47

1,639,096

0.17%

FERA 232

C. fraxinea

1,919,713

48

1,918,416

0.27%

FERA 233

C. fraxinea

1,838,409

47

1,837,097

0.26%

Mature fruiting body (MFB1)

C. fraxinea

15,033,876

49

14,524,487

27%

Primordial fruiting body (PFB1)

C. fraxinea

7,298,316

49

7,187,891

16%

AT1

Mixed material

31,059,879

48

30,388,033

29%

AT2

Mixed material

23,096,021

44

21,924,824

13%

Holt

Mixed material

36,960,651

49

36.231.834

21%

Upton

Mixed material

43,099,662

48

41,367,071

32%

Table 1:  Read number pre- and post-filtering, G+C content of raw data and percentage of KW1 genome positions covered at a depth of >=10X.

SNP calling using VarScan revealed a high level of genetic variation between the samples (Table 2).  The mature fruiting body (MFB1) and primordial fruiting body (PFB1) samples showed a high level of variation (~16,000 SNPs) when aligned to the KW1 genome. Samples AT1 (~6,000), AT2 (~3,000), Holt (~14,000) and Upton (~12,000) also suggested a high level of genetic variation although this could be due to the alignment of non-Chalara reads present in the mixed sample, which have a high degree of similarity to regions of the C. fraxinea genome. This highlights the importance of the availability of multiple genomes due to such high levels of genetic variation.

Sample Sample type # Unambiguous bases # SNPs VarScan Total

KW11

C. fraxinea

19,974,663

59

NA

FERA 88

C. fraxinea

167,378

40

204

FERA 93

C. fraxinea

127,827

20

FERA 94

C. fraxinea

150,981

47

FERA 105

C. fraxinea

104,667

22

FERA 232

C. fraxinea

169,295

53

FERA 233

C. fraxinea

165,989

35

MFB1

C. fraxinea

16,519,209

11,852

16,396

PFB1

C. fraxinea

9,869,668

9,545

AT12

Mixed material

18,108,777

6,029

NA

AT22

Mixed material

8,016,701

3,138

Holt

Mixed material

12,975,549

13,654

Upton

Mixed material

19,789,944

12,259

Table 2: VarScan SNP calling of C. fraxinea isolates, fruiting body and mixed material samples.  1MAT1-1 mating type; 2MAT1-2 mating type; UAP: unambiguous positions.

We identified respectively 40, 20, 47, 22, 53, 35 single-nucleotide differences (10X coverage, 95% consensus) for samples FERA88, FERA93, FERA94, FERA105, FERA232, FERA233 when compared to the KW1 genome. In total, 204 unique SNPs were identified in the FERA samples when multi-sample calling was used. P-values were calculated using a Fisher’s Exact Test on the read counts supporting reference and variant alleles. Of the 204 SNPs, 165 were located within an annotated gene, 155 of these within annotated exons. These 155 SNPs were distributed across 70 genes which are we are currently annotating (BLAST & PFAM).

One of these genes (CHAFR746836.1.1_0031990) has a sequence that is similar (62% identity) to cerato-platanin from the basidiomycete Trametes versicolor (GenBank: EIW62259.1) and was first identified in the ascomycete, Ceratocystis fimbriata, the causal agent of “canker stain disease” in Platanus x acerifolia in Europe (Pazzagli, et al., 1999; Pazzagli, et al., 2006). It belongs to a family of cerato-platanin phytotoxic proteins which are found in the cell wall of the fungus (Boddi, et al., 2004) and are involved in the host-pathogen interaction (Pazzagli, et al., 1999). Six SNPs were identified in this protein. All the FERA samples except FERA88 and FERA105 differ from KW1 at all six SNPs; FERA 88 and FERA105 are identical to KW1 at all six SNPs (Fig 1). At these six SNPs, the primordial fruiting body sample resembles FERA 105 and FERA 88 and differs from FERA 90, FERA 232, FERA233, FERA 94 and the mature fruiting body sample. These two fruiting body samples originate from different sources. In the mature fruiting body sample, which is likely to contain both dikaryotic hyphae and haploid ascospores, all 6 SNPs appear to be heterokaryotic. In all other samples the SNPs are homokaryotic/homozygous.

preview_1215165

Fig 1: (CLick for larger view) IGV view of putative cerato-platanin gene in C. fraxinea. The six SNPs called by VarScan are shown in the top track. The next four tracks show the homokaryotic/homozygous SNPs which are the same in FERA93, FERA232, FERA233 and FERA94, the sixth track shows possible heterokaryotic SNPs in the mature fruiting body (MFB1) and the last three tracks show that the homokaryotic/homozygous SNPs which resemble the KW1 genome sequence are also present in FERA88, FERA105 and the primordial fruiting body (PFB1).

 

Normalisation of the identified SNPs shows that the highest variation occurs in fruiting body samples, Holt and Upton when compared to the KW1 genome (Fig 1). Further analysis is required to understand the significance of this difference in variation and whether the Upton and Holt variation is due to the presence of other fungi, such as Phytothphora sp. and Togninia sp. in the mixed samples interfering with the SNP calling.

 

preview_1215163

Fig 2: Normalised number of SNPs against bases with >=10X . Click for larger view

References

Boddi, S., et al. (2004) Cerato-platanin protein is located in the cell walls of ascospores, conidia and hyphae of Ceratocystis fimbriata f. sp. platani, FEMS Microbiology Letters, 233, 341-346.

Koboldt, D.C., et al. (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing., Genome research, 22, 568-576.

Pazzagli, L., et al. (1999) Purification, characterization, and amino acid sequence of cerato-platanin, a new phytotoxic protein from Ceratocystis fimbriata f. sp. platani., The Journal of biological chemistry, 274, 24959-24964.

Pazzagli, L., et al. (2006) Cerato-platanin, the first member of a new fungal protein family: cloning, expression, and characterization., Cell biochemistry and biophysics, 44, 512-521.

Trapnell, C., Pachter, L. and Salzberg, S.L. (2009) TopHat: discovering splice junctions with RNA-Seq., Bioinformatics (Oxford, England), 25, 1105-1111.