Question

filtering reads from plant associated microbes

0

Entering edit mode

4.2 years ago

minions-b • 0

Hi,

I am a newby for RNA-seq and now trying to analysing a transcriptome of two tree species: one with ref. genome and another one without ref. genome. I have some questions that I keep thinking again and again and I hope you can help me to address them.

Can you explain me why a ref. genome + gtf/gff3 is preferred for differential gene expression analysis? Why don't we extract transcripts from gtf/gff3, make a list of transcript in the genome, and use it as in de novo transcriptome assembly? I understand that computational-wise it would be easier (and less resource-intensive?) for a read mapper to map reads back to a list of transcripts than to a genome?
In addition to rRNA, I usually map reads to a ref. genome to eliminate potential contaminations from plant-associated microbes, but I can't do that for the genome-free species and that I will have to do de novo transcriptome assembly. For now, I think of using all reads for de novo assembly/differential gene expression analysis before blasting those with significant expression to screen for contaminations. Do you think this would be okay? Or should I try to use Blobtools to remove contaminations before de novo assembly? This could be overkill, but I don't really know how many reads from the germs are made through. :(

Thank you very much for your help in advance! All comments and suggestions are welcome!

RNA-Seq rna-seq • 604 views

ADD COMMENT • link 4.2 years ago by minions-b • 0

score 0 · Answer 1 · 2020-01-25

ref. genome + gtf/gff3 is preferred for differential gene expression analysis?

As long as the quality of both is reasonably good they represent the ground truth. Reason you always try to map back to the genome, when you can, is that way you don't force any read to artificially map in a spot where it did not originate from. Using a reduced representation of the genome (transcripts only) may allow that to happen.

I don't really know how many reads from the germs are made through. :(

One takes reasonable precautions on the experimental side to reduce the potential contamination where possible. You can use any potential tools that can try to remove reads that don't belong. You will have to try and filter out contigs/transcripts after the assembly. Microbial reads/contigs should be recognizable compared to plants (with some exceptions like chloroplast sequences).