Chalara ash dieback was first confirmed in the natural environment in the UK in late autumn based on samples from Ashwellthorpe Wood near Norwich. We decided early on in this project that speed would be a critical driver given the emergency nature of the problem. We decided that we should generate genetic sequences as rapidly as possible, release them to the community, and prompt the crowdsourcing exercise we have been publicizing since Friday.
The normal procedure would be to culture the pathogen and sequence the genome and transcriptome from cultured material and controlled laboratory infections. Here, we decided to take the unusual step of directly sequencing the “interaction transcriptome” of a lesion dissected from an infected ash twig. This was the most rapid way to proceed to generate useful information without proceeding through standard laboratory culturing. This is the shortest route from the wood to the sequencer to the computer. The question that many of you must be asking is how useful is this data? This post addresses this question and summarizes the preliminary analyses that the TSL team has produced.
How much of the data is of fungal origin?
Diane Saunder’s analysis indicates that ~30% of the assembled transcripts have top hits to fungal sequences. The proportion of fungal sequences is probably even higher given that >15% of the assembly contigs do not hit any known sequences.
In our experience with Phytophthora pathosystems, transcriptome sequencing of infected plant tissue typically yields 1-2% sequences from the pathogen although up to 20% pathogen sequences can be recovered. Thus, the ~30% fungal sequences recovered here is unusually high. Perhaps, ash pith yields less plant RNA than leaves or roots and ended up being underrepresented in the sample.
What proportion of the fungal sequences are from Chalara fraxinea?
A BLASTN search of the 116 C. fraxinea sequences in GenBank against the AT1 assembly was very informative. The AT1 fungal sequences are mostly C. fraxinea transcripts (or from a very closely related taxon). For example, the cobalamin-independent methionine synthase-like protein gene matched AT1 with 565 nucleotides out of 569 (99%); elongation factor-1 alpha (EF1a) gene 768/768 (100%) etc.
If the fungal sequences were from mixed taxa then we would expect multiple divergent hits for the genes above. This doesn’t seem to be generally the case. It is likely that there are sequences from other organisms besides C. fraxinea and ash. But perhaps these were not abundant enough to assemble into long contigs.
How good is the AT1 assembly?
RNAseq assemblies of short reads vary tremendously in quality. The AT1 assembly appears to contain a reasonable proportion of full-length CDS assemblies. For example, comp1171, a fungal polyketide synthase, is 7724 bp, and includes a full length CDS of 2479 amino acids.
Are there interesting genes you could already highlight?
A full length Nep1-like protein (NLP, comp507) with similarity to actinoporin toxins is highlighted here. The polyketide synthase comp1171 mentioned above could also synthesize a toxin. In addition, comp8971 encodes a full-length secreted protein with four LysM domains, and belongs to a well known family of fungal effectors.
How do you know these genes are from C. fraxinea?
At this point we don’t know for sure although it is likely that the sequences originate from C. fraxinea given the comments above about reduced complexity in housekeeping gene sequences. Once we have pure cultures and genomic reads we will be able to address this. Of course, those of you who have already generated genomic sequences of C. fraxinea could easily answer this question.
It should be noted that it would be informative if some of the interesting sequences, such as the toxins, turn out to originate from another organism especially if they consistently associate with Chalara ash dieback. Perhaps this is a complex pathosystem that involves multiple organisms. We just know very little at this point.
More ash dieback transcriptomes from independent samples and genome sequences of British isolates. This is all work in progress; and we will immediately post the data on OADB as soon as available.
A more general lesson?
There is a lesson from this first dataset. Whenever new or suspect plant diseases arise we should immediately sequence transcriptomes from field collected diseased tissue. These days the cost of an RNAseq lane is reasonable and the assemblies are pretty decent. The data generated should rapidly provide valuable information about the nature of the pathogen and offer an initial insight into its genes. Whenever time is of the essence, transcriptome sequencing should be initiated as soon as possible.
Posted by @KamounLab
Kamoun, S. 2012. Genomics of emerging plant pathogens: too little, too late. Microbiology Today, 39:140.