Question: Tools to remove duplicate or substring reads
0
gravatar for chjiao3456
4.0 years ago by
chjiao345640
Michigan State University, USA
chjiao345640 wrote:

Is there any efficient tool to remove substring reads or duplicate reads from NGS data set? I know that readjoiner could remove the duplicated reads, but seems not work on substring reads. Thanks.

Example: Duplicates: read1: AGTCAT read2: AGTCAT In this case, only one read will be kept.

Substring: read1: GTCA read2: AGTCAT In this case, read1 will be removed.

sequencing alignment next-gen • 864 views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by chjiao345640
0
gravatar for Brian Bushnell
4.0 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

The most efficient tool for this purpose is Dedupe from the BBMap package. However, it requires all reads to be stored in memory, so it needs a lot of memory. Can you explain in more detail what you are trying to do?

ADD COMMENTlink written 4.0 years ago by Brian Bushnell17k

Thanks for your help. I have added examples in the question.

ADD REPLYlink written 4.0 years ago by chjiao345640
0
gravatar for chjiao3456
4.0 years ago by
chjiao345640
Michigan State University, USA
chjiao345640 wrote:

Just noticed that SGA tool is able to do this. Collapse Reads That Are Substrings Of Other Reads In Same Library

ADD COMMENTlink written 4.0 years ago by chjiao345640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1284 users visited in the last hour