what should I do first: transciptome annotation or DE analysis?
1
1
Entering edit mode
3.9 years ago
vinicius ▴ 10

Hello, I've made a de novo transcriptome assembly of an invertebrate organism, and have made some downstream analysis. I tried first running differential gene expression analyis with DESeq2, and found that ~3000 transcripts was considered as diff. expressed (adj-pval < 0.001 and |LFC| >= 2). But when I annotate this transcripts with Trinotate, I saw that many of them was contaminants like fungi, bacteria, parasites and others.

I think I can't just simply ignore this contaminants and select only transcripts matching the organism of interest and keep on with downstream analysis such as GO enrichment. My DE analysis would get biased if I do that, right?

So I've inverted the order running annotation first, then removing contaminants, and then selecting only those transcripts matching a sequence of the organism of interest for DE analysis (~5000). But by doing this, I got like 30 DE transcripts with the same thresholds, and even if I set the thresholds to adj-pval < 0.05 and |LFC| >=1, I got ~120 transcripts.

Is it normal the ammount of diff. express genes being around 100 even with loose thresholds? Or should I make DE analysis with all assembled transcripts? Because most papers I read seems like they did DE analysis before annotation, but I didn't see nothing about removing contaminants.

If anyone had any tips for me, I would be very thankful!

rna-seq assembly transcriptome annotation • 669 views
ADD COMMENT
1
Entering edit mode
3.9 years ago

You definitively should remove contaminants before doing your DE analysis. Else you are just testing which sample have more contamination which again will bias your analysis making it untrustworthy.

With regards to the number of differentially expressed features there are no guidelines - it depends on how different the conditions are (think cancer vs normal (very different) or DMSO vs water (very similar)) as well as the power of your study (number of replicates). The best you can do is look for genes you know/suspect are different based on other observations.

ADD COMMENT
0
Entering edit mode

Thank you so much!!!

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6