QC RNA-Seq(Duplicates)

0

Entering edit mode

2.5 years ago

backpackbio • 0

I have collected a lot of RNA-seq(Cancer) data from different sources to be used for standardisation for a Differential Expression analysis pipeline. A lot of samples(>50) contain high duplication levels(80-90%) and Total Number of reads is also very high(around 150-250 Millions). Is there a set cut-off for Duplication levels in RNA-seq? I have tried searching in few literature but they don't seem to help much. It would be a huge help if anyone can suggest any literature or a source where I can find my answers. Thanks in advance!

RNA-seq QC Duplicates • 766 views

ADD COMMENT • link 2.5 years ago by backpackbio • 0

1

Entering edit mode

Is there a set cut-off for Duplication levels in RNA-seq?

No there is none. Unless you can identify optical/PCR duplicates (which requires UMI) one can't decide if the read is a real copy or sequencing duplicate. There is a study that says most of the RNAseq data is real. (LINK).

If you are collecting data from diverse sources there is going to be a lot of batch effects. You should be mindful of that possibility, if you are using such data for any standardization.

ADD REPLY • link 2.5 years ago by GenoMax 141k

0

Entering edit mode

Hi GenoMax, thanks for your reply and the linked article. I am aware of the batch effects(it's a pain) and we are working on it to resolve that. Thank you for your kind suggestions. :)

ADD REPLY • link 2.5 years ago by backpackbio • 0

Login before adding your answer.