Can I perform differential gene expression with samples from Poly-A and Ribo-deplation ?
4
0
Entering edit mode
5.2 years ago
dganiewich ▴ 130

Hi, I have a question regarding differential gene expression analysis when the samples library method is different.

I have 10 samples total, 3 from one condition and 7 from the other. Among the first 3, 2 where sequenced using poly-a and one with ribo depletion library preparation method. In the other group I also have a mixture of them, but I could obtain more and make them all from the same method (any of them).

My question is if it is still possible to perform differential gene expression or if the library preparation method is a source of bias so big that leads to unreliable results.

Thank you very much in advance! Best, Daiana

RNA-Seq differential expression • 1.8k views
ADD COMMENT
4
Entering edit mode
5.2 years ago

That should not be a problem since you have both library types in both conditions - you just need to correct for it. For DE analysis you can just add it as a batch effect (aka co-variate) in your model (which you then ignore). For other analysis (pca, clustering, heatmaps etc) you can remove the effect using either limma::removeBatchEffect() or sva::combat() before doing the analysis.

ADD COMMENT
0
Entering edit mode

Thank you kristoffer! I will definitely use this!

ADD REPLY
0
Entering edit mode

Update below, even Batch Effect removal was not enough

ADD REPLY
3
Entering edit mode
5.1 years ago
dganiewich ▴ 130

For anyone that encounters a similar situation:

Even after batch effect removal, the library method is so different that it was not possible to compare them after all. QC plots showed that samples still clustered by library method and separated from their own same origin sample. This paper helped me understand a little better the differences and reasons why this was happening: https://www.nature.com/articles/s41598-018-23226-4 In summary, I concluded that the main source of bias is the percentage of exonic/mature transcripts reads in each case.

If anyone comes up with a different idea I'd be delighted to hear it.

Best, Daiana

ADD COMMENT
0
Entering edit mode

Loads of suggestions: Is the clustering by dendogram or PCA/MDS? How does it look if you subset to only protein coding genes? Are there then large differences in the total number of reads(pairs) mapped with the two lib types? What if you do the clustering on the top 500 variable genes ( from a log transformed abundance matrix)?

ADD REPLY
1
Entering edit mode
5.2 years ago
skbrimer ▴ 740

I bet this gets flagged as off topic since it is more of a question on experimental design and not strictly a bioinformatics question. With that said I found this paper that compares the two https://www.sciencedirect.com/science/article/pii/S0888754310001746 and it looks like it should be okay to move forward with your analysis. Also it depends (like everything in biology) on your experiment and what question you are asking. Sorry if this is not that helpful, good luck.

ADD COMMENT
0
Entering edit mode

Thank you for your comment skbrimer!

ADD REPLY
1
Entering edit mode
5.2 years ago
Tm ★ 1.1k

If the aim of your study is to get differentially expressed mRNAs, then definitely you can compare both types of libraries as they both help enriching mRNA. But, if you are looking for expression analysis of specific non-coding RNAs then it would be difficult.

ADD COMMENT
0
Entering edit mode

Great point. It is indeed for mRNA. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6