Question: PCR duplicates in RNASeq
0
gravatar for prakash
9 days ago by
prakash0
INDIA
prakash0 wrote:

Hello Bio stars,

I have small query regarding identification and removal of PCR duplicates from RNASeq data. The Tophat2 alignment stats calculated from samtool flagstat shows that there is no PCR duplicates (below alignment stats), but when I used samtools rmdup , it removes significant amount of reads from alignment. What discrepancy is this or I am missing some information.

Alignment stats before removing duplicates

78300010 + 0 in total (QC-passed reads + QC-failed reads)
4330622 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
78300010 + 0 mapped (100.00% : N/A)
73969388 + 0 paired in sequencing
37276554 + 0 read1
36692834 + 0 read2
68770674 + 0 properly paired (92.97% : N/A)
71824262 + 0 with itself and mate mapped
2145126 + 0 singletons (2.90% : N/A)
422144 + 0 with mate mapped to a different chr
158018 + 0 with mate mapped to a different chr (mapQ>=5)

Alignment stats after removing duplicates

20391758 + 0 in total (QC-passed reads + QC-failed reads)
1265797 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
20391758 + 0 mapped (100.00% : N/A)
19125961 + 0 paired in sequencing
9838044 + 0 read1
9287917 + 0 read2
17932936 + 0 properly paired (93.76% : N/A)
18650095 + 0 with itself and mate mapped
475866 + 0 singletons (2.49% : N/A)
85116 + 0 with mate mapped to a different chr
32276 + 0 with mate mapped to a different chr (mapQ>=5)
rna-seq forum • 158 views
ADD COMMENTlink modified 9 days ago by geek_y7.8k • written 9 days ago by prakash0
3
gravatar for geek_y
9 days ago by
geek_y7.8k
Barcelona/London
geek_y7.8k wrote:

tophat2 does not flag duplicates by default. So flagstat ( the name itself indicates, it gives stats on flags in sam/bam file ) does not show any duplicates.

rmdup removes duplicates based on alignment positions. So it removes them irrespective of flagging.

BTW, duplicate removal is not a good idea for RNA-Seq.

ADD COMMENTlink modified 9 days ago • written 9 days ago by geek_y7.8k

Thank you geek_y !

So if rmdup is removing duplicates, it indicate that my sample has duplicates. I understand duplicate removal is not an good idea for differential expression study, but is it not necessary to remove for Alternate splicing analysis.

ADD REPLYlink written 9 days ago by prakash0

In RNA-Seq, its difficult to distinguish a true transcript expression from PCR duplicate. Its not advisable to remove duplicates.

ADD REPLYlink written 9 days ago by geek_y7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1444 users visited in the last hour