How to remove duplicate reads after ONT basecalling from fastq.gz files
1
0
Entering edit mode
14 months ago
Darrenjdd • 0

Hi

I ran into an error when my PC was halfway through basecalling, it errored on a fast5 file - number 577. I restarted basecalling from 578, and then when finished I basecalled 577 on its own. It looks like originally it failed after processing about 1000 reads. I can create a text file with a list of read IDs that were completed the first time, but I need something to search through the fastq.gz files for read IDs in that list and remove them. Does anyone know how I can do that?

Thanks.

fastq • 578 views
ADD COMMENT
2
Entering edit mode
14 months ago
GenoMax 142k

You should be able to use filterbyname.sh from BBTools. Provide it with the names of reads you want to filter.

ADD COMMENT
0
Entering edit mode

Perfect! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6