I want to filter/remove some reads from a gzipped fastq file, according to a list of reads ID. How should I do?
I have one idea here. Say we have in.fastq.gz, reads-remove-list:
First, output the reads' ID of fastq file:
zcat in.fastq.gz | awk 'NR%4==1' | sed 's/@//' > in.fastq.readsID
Second, filter reads in reads-remove-list and get the remaining reads' ID:
grep -f reads-remove-list in.fastq.readsID -v > remaining.list
Third, use seqtk subseq extract the remaining reads:
seqtk subseq in.fastq.gz remaining.list | gzip - > remaining.fastq.gz
Do you guys have more direct/efficient solutions? just use some shell commands or other scripts?
Thanks for your suggestion!