QC RNA-Seq(Duplicates)
0
0
Entering edit mode
2.5 years ago

I have collected a lot of RNA-seq(Cancer) data from different sources to be used for standardisation for a Differential Expression analysis pipeline. A lot of samples(>50) contain high duplication levels(80-90%) and Total Number of reads is also very high(around 150-250 Millions). Is there a set cut-off for Duplication levels in RNA-seq? I have tried searching in few literature but they don't seem to help much. It would be a huge help if anyone can suggest any literature or a source where I can find my answers. Thanks in advance!

RNA-seq QC Duplicates • 766 views
ADD COMMENT
1
Entering edit mode

Is there a set cut-off for Duplication levels in RNA-seq?

No there is none. Unless you can identify optical/PCR duplicates (which requires UMI) one can't decide if the read is a real copy or sequencing duplicate. There is a study that says most of the RNAseq data is real. (LINK).

If you are collecting data from diverse sources there is going to be a lot of batch effects. You should be mindful of that possibility, if you are using such data for any standardization.

ADD REPLY
0
Entering edit mode

Hi GenoMax, thanks for your reply and the linked article. I am aware of the batch effects(it's a pain) and we are working on it to resolve that. Thank you for your kind suggestions. :)

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6