Duplicate reads in RNA-seq
1
0
Entering edit mode
7.8 years ago
mmrcksn ▴ 50

Hi everyone,

I have some paired end RNA-seq samples that have high levels of duplication (some as high as only 6% remaining after de-duplication). I think it was due to low concentration of input RNA (~1ng), and smaller subset of genes being expressed (because the RNA is from a specific cell type isolated from brain). Even after a poly-A selection, the highest gene expressed in my samples was a ribosomal RNA transcript.

I used Picard's MarkDuplicates to remove duplicated reads from my samples and looked at how that affected counting. I was happy to see that the counts for the rRNA gene were greatly reduced, but it also seems that the counts for almost every single gene are reduced. I thought that only high expressing genes would have duplicate reads. I also did a correlation analysis between the regular samples and the de-duplicated samples and saw that there was excellent correlation between them, but I'm just confused now.

If basically every gene has duplicates, what does it mean? Should I only use de-duplicated samples for further analysis? I know there are lots of other threads on this issue but it seems like my duplication is more severe.

RNA-Seq duplicate reads picard • 3.8k views
ADD COMMENT
1
Entering edit mode

Someone with better experimental chops will need to confirm but perhaps extra cycles of amplifications caused this problem?

If you feel that the experiment did not work as intended then perhaps it is time to consider redoing (at least the library part) (that is easy for someone like me to say, so apologies in advance, if this is an irreplaceable sample/difficult experiment).

ADD REPLY
1
Entering edit mode
7.8 years ago
igor 13k

You definitely have more duplicates than usual. If you started with little RNA, then you must have amplified a lot, so it makes sense that you have a lot of duplicates. They would be found in all genes, since you are amplifying all genes. Thus, all genes would have fewer counts after duplicate removal.

See previous extensive discussion on the topic here: How detrimental are duplicate reads in RNAseq experiments?

ADD COMMENT

Login before adding your answer.

Traffic: 2985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6