Question: Duplicate reads in RNA-seq
gravatar for mmrcksn
4.5 years ago by
mmrcksn50 wrote:

Hi everyone,

I have some paired end RNA-seq samples that have high levels of duplication (some as high as only 6% remaining after de-duplication). I think it was due to low concentration of input RNA (~1ng), and smaller subset of genes being expressed (because the RNA is from a specific cell type isolated from brain). Even after a poly-A selection, the highest gene expressed in my samples was a ribosomal RNA transcript.

I used Picard's MarkDuplicates to remove duplicated reads from my samples and looked at how that affected counting. I was happy to see that the counts for the rRNA gene were greatly reduced, but it also seems that the counts for almost every single gene are reduced. I thought that only high expressing genes would have duplicate reads. I also did a correlation analysis between the regular samples and the de-duplicated samples and saw that there was excellent correlation between them, but I'm just confused now.

If basically every gene has duplicates, what does it mean? Should I only use de-duplicated samples for further analysis? I know there are lots of other threads on this issue but it seems like my duplication is more severe.

rna-seq picard duplicate reads • 2.6k views
ADD COMMENTlink modified 4.5 years ago by igor12k • written 4.5 years ago by mmrcksn50

Someone with better experimental chops will need to confirm but perhaps extra cycles of amplifications caused this problem?

If you feel that the experiment did not work as intended then perhaps it is time to consider redoing (at least the library part) (that is easy for someone like me to say, so apologies in advance, if this is an irreplaceable sample/difficult experiment).

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by GenoMax94k
gravatar for igor
4.5 years ago by
United States
igor12k wrote:

You definitely have more duplicates than usual. If you started with little RNA, then you must have amplified a lot, so it makes sense that you have a lot of duplicates. They would be found in all genes, since you are amplifying all genes. Thus, all genes would have fewer counts after duplicate removal.

See previous extensive discussion on the topic here: How detrimental are duplicate reads in RNAseq experiments?

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by igor12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1430 users visited in the last hour