Should PCR duplicates always be removed?
1
1
Entering edit mode
6.8 years ago
jtwalker ▴ 20

I am working with fairly low coverage GBS data (average <11 read depth), and as such am wondering if it makes sense for me to remove PCR duplicates from my data, as it seems that these are just adding extra depth. Does anyone know if there is a general rule that should be followed in this situation?

Thanks!

edit: I forgot to mention that I have paired end reads, and I'm not sure if this will change the answer to my question.

GBS Samtools PCR duplicates • 3.0k views
ADD COMMENT
6
Entering edit mode
6.8 years ago
Fabio Marroni ★ 3.0k

Actually, I think that GBS is one of the very few applications in which you can avoid removing PCR duplicates, because the space that you are sequencing is usually small enough to guarantee that you will find perfect duplicates by chance alone. However, this depends on several properties. For example, having paired end reads and such a low coverage. you should have few duplicates. If you have a lot, then you have a problem, and your reads represent more the PCR artifacts than the distribution of the sample. I also suggest that you refer to some of the several software packages develpoed for working on GBS data, such as STACKS: https://github.com/enormandeau/stacks_workflow

EDIT: As Eric correctly pointed out, the correct link to STACKS is this: http://catchenlab.life.illinois.edu/stacks/ I apologize for the mistake.

ADD COMMENT
1
Entering edit mode

Hi Fabio. Thanks for refering to my Github repository. However, my code is not the source of the STACKS package, just a set of scripts to manage GBS projects and run STACKS itself.

STACKS can be found here: http://catchenlab.life.illinois.edu/stacks/

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6