Do I have to mark duplicates before mapping the reads
2
0
Entering edit mode
6.0 years ago
yueli7 ▴ 250

Hello,

I think the pipeline of the RNA-seq is trim adaptor, QC, then mapping the reads.

Do I have to mark duplicates before mapping the reads?

Thanks in advance.

RNA-Seq • 3.3k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

I guess when you say "duplicates before mapping", you mean reads with identical sequences.

You don't need to. And in RNA-Seq, its not usual to remove duplicate reads.

ADD COMMENT
0
Entering edit mode

Thanks!

Here duplicates means MarkDuplicates.

ADD REPLY
1
Entering edit mode

MarkDuplicates is done after alignment

ADD REPLY
2
Entering edit mode
6.0 years ago
munizmom ▴ 60

As geek_y said, usually is not necessary. This question has being addressed several times , for example in http://seqanswers.com/forums/showthread.php?t=6854 you can find a nice post with some views about it.

What I do is first check in the fastqc report for the level fo duplicated reads and check manually if there are overrepresented sequences using blast to identify them but if the levels of duplication are low then I dont remove any. In case I think that maybe removing them will be necessary I compare the output of samtools view (selecting the parameters to get the bam files with and without duplicates) and decide, but I did not have to remove them in any occasion yet ... although my experience is limited :)

ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6