Question: Doubt on removing duplicates on amplicon sequencing data
2
gravatar for Picasa
2.5 years ago by
Picasa560
Picasa560 wrote:

Hi,

I am looking to make a simple SNP analysis.

I have different individuals from which we have targeted specific markers. Then the reads I have come from amplicon sequencing. My questions are:

1) Do I have to remove duplicates ? From what I understand, tools like Picard look for the same 5', but by definition, amplicon sequencing reads start by the same position?

2) If no: how can I treat these data, because if if an error is propagate during the PCR, it will be a bad call at the end ?

Edit: 3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.

Thanks for your help.

amplicon duplicates • 1.6k views
ADD COMMENTlink modified 2.5 years ago by Devon Ryan97k • written 2.5 years ago by Picasa560
2
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:
  1. No for the reasons you listed.
  2. Correct, that's the down-side to amplicons (unless you put UMIs on your PCR primers).
  3. Removing optical duplicates can be done with clumpify from BBTools. However, this doesn't end up working that well for amplicons unless you spiked in a lot of PhiX or had a very large number of amplicons on the same lane. Otherwise you end up overly removing sequence (not that this ends up being a huge problem).
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Devon Ryan97k

Thanks Devon, I have edited my post with another question. Maybe you have not seen it:

3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Picasa560
1

I just edited my response accordingly.

ADD REPLYlink written 2.5 years ago by Devon Ryan97k

Would you have to remove duplicates, when comparing abundance of two transcript isoforms of a gene?

ADD REPLYlink written 13 days ago by caggtaagtat1.3k
1

Depends on how badly affected they are, in general if the transcripts are highly enough expressed you're going to start having false-positive duplicates, so it's best to avoid that unless you really need to.

ADD REPLYlink written 12 days ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour