Question: Sorting reads from host-pathogen interaction
gravatar for cwbenson1993
11 days ago by
cwbenson19930 wrote:

I am working on rna-seq data for a host-pathogen interaction between a grass species and its fungal parasite. The ultimate goal is to do differential expression analysis and functional enrichment to see what genes and pathways are involved in parasitism.

I have:

  1. Draft genome of the fungus
  2. RNA-seq reads from non-infected grass
  3. RNA-seq reads from infected grass (contains grass and fungal transcripts)
  4. RNA-seq reads from the fungus growing in culture

I built the transcriptome of the fungus using just the reads from the culture grown fungus, and I also built the grass transcriptome with only the non-infected reads. Now im thinking it would be useful to rebuild those trascriptomes to include reads from the infected tissue to capture transcripts that are unique to the host-pathogen interaction.

Is there a way to filter the infected reads into grass and fungal groups using the resources I currently have?

Perhaps I could align the infected grass reads (#3) to the fungal transcriptome, and use only the un-mapped reads to rebuild the grass transcriptome? Maybe I can use BLAST, BBduk, or some other tool on the unmapped reads to further filter out fungal reads before using them to build the grass transcriptome.

rna-seq assembly • 135 views
ADD COMMENTlink modified 10 days ago • written 11 days ago by cwbenson19930

valid approach indeed. I could consider aligning them to the fungal genome (as well?) in order to filter out the fungal ones.

ADD REPLYlink written 11 days ago by lieven.sterck890

Hey lieven.sterck,

Thanks for the response! Ive considered using BBsplit to further sort, but unfortunately I dont have genomic sequence of the plant.

Does anyone know a tool that can sort RNA-seq data using the genome of one of the host-pathogen species?

ADD REPLYlink written 10 days ago by cwbenson19930

Can't you just align them to the fungal genome and then use the ones that do not map (== likely to be plant ones) ?

ADD REPLYlink written 10 days ago by lieven.sterck890

That would be the way to go.

ADD REPLYlink written 10 days ago by genomax43k
gravatar for cwbenson1993
10 days ago by
cwbenson19930 wrote:

Its the novel transcripts that im concerned about. If reads don't map to the fungus or the plant, then they correspond to a transcript that is specifically expressed at the host-pathogen interaction; either plant or fungus. For example, if I map infected grass reads to the fungal transcriptome and use the unmapped reads to build the grass transcriptome, I would still have the novel fungal transcripts present in my grass assembly.

I dont know if its possible to further sort unmapped reads using the fungal genome, or maybe its not even worth troubling myself over.

ADD COMMENTlink written 10 days ago by cwbenson19930

not worth troubling yourself over I would say ;-)

you will likely always end up with more or less a mixture of sequence-origins.

On the other hand if you map to the fungal genome you should be able to remove all fungal derived reads (regardless at what stage or infection they are expressed ) since all these reads should be derived from the genome somewhere so even the 'novel ones' in your denovo transcriptome. I understand that you only have a draft genome so some might slip through at this stage but nothing to cause a big fuzz about i think.

ADD REPLYlink written 10 days ago by lieven.sterck890

Fantastic! Thanks for all the help!

ADD REPLYlink written 10 days ago by cwbenson19930
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1601 users visited in the last hour