Tools to remove duplicate or substring reads
2
0
Entering edit mode
7.4 years ago
chjiao3456 ▴ 40

Is there any efficient tool to remove substring reads or duplicate reads from NGS data set? I know that readjoiner could remove the duplicated reads, but seems not work on substring reads. Thanks.

Example: Duplicates: read1: AGTCAT read2: AGTCAT In this case, only one read will be kept.

Substring: read1: GTCA read2: AGTCAT In this case, read1 will be removed.

next-gen sequencing alignment • 1.5k views
ADD COMMENT
0
Entering edit mode
7.4 years ago

The most efficient tool for this purpose is Dedupe from the BBMap package. However, it requires all reads to be stored in memory, so it needs a lot of memory. Can you explain in more detail what you are trying to do?

ADD COMMENT
0
Entering edit mode

Thanks for your help. I have added examples in the question.

ADD REPLY
0
Entering edit mode
7.4 years ago
chjiao3456 ▴ 40

Just noticed that SGA tool is able to do this. Collapse Reads That Are Substrings Of Other Reads In Same Library

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6