Question: Should PCR duplicates always be removed?
gravatar for jtwalker
2.1 years ago by
jtwalker20 wrote:

I am working with fairly low coverage GBS data (average <11 read depth), and as such am wondering if it makes sense for me to remove PCR duplicates from my data, as it seems that these are just adding extra depth. Does anyone know if there is a general rule that should be followed in this situation?


edit: I forgot to mention that I have paired end reads, and I'm not sure if this will change the answer to my question.

pcr duplicates samtools gbs • 1.3k views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 2.1 years ago by jtwalker20
gravatar for Fabio Marroni
2.1 years ago by
Fabio Marroni2.3k
Fabio Marroni2.3k wrote:

Actually, I think that GBS is one of the very few applications in which you can avoid removing PCR duplicates, because the space that you are sequencing is usually small enough to guarantee that you will find perfect duplicates by chance alone. However, this depends on several properties. For example, having paired end reads and such a low coverage. you should have few duplicates. If you have a lot, then you have a problem, and your reads represent more the PCR artifacts than the distribution of the sample. I also suggest that you refer to some of the several software packages develpoed for working on GBS data, such as STACKS:

EDIT: As Eric correctly pointed out, the correct link to STACKS is this: I apologize for the mistake.

ADD COMMENTlink modified 20 months ago • written 2.1 years ago by Fabio Marroni2.3k

Hi Fabio. Thanks for refering to my Github repository. However, my code is not the source of the STACKS package, just a set of scripts to manage GBS projects and run STACKS itself.

STACKS can be found here:

ADD REPLYlink written 20 months ago by Eric Normandeau10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1242 users visited in the last hour