How to exclude reads from a fastq file, using a list of read names?
1
0
Entering edit mode
11 weeks ago
Tash ▴ 20

I have a fastq file from a sequencing run, and I have a list of read names, which I wish to exclude from the fastq files. I am having some trouble filtering the whole entry based on names only. I have pasted my attempts and rationale below, and any assistance would be much appreciated!

cat names_pc3.txt  
a308dbd5-df59-47f2-a92a-7068ffd0ce8d
353002fa-3d36-4e03-a8c4-b6e983464bf9
697f8ebb-9fa7-487e-ae91-8095bfb9968b
05887f55-8028-4524-8000-71e72de789d0

I can exclude the line containing the read name with grep, however I wish to remove the whole 4 line entry for each read.

To remove the line containing the read name, this works:

grep --no-group-separator -v -f names_pc3.txt PAQ11486_pass_barcode01_52e0f3ff_4295e9c8_0.fastq

To view the entry for the reads I want to exclude, this works:

grep --no-group-separator -A 3 -f names_pc3.txt PAQ11486_pass_barcode01_52e0f3ff_4295e9c8_0.fastq

However, when I try to do both at the same time using the command below, it just prints all entries in the fastq:

grep --no-group-separator -A 3 -v -f names_pc3.txt PAQ11486_pass_barcode01_52e0f3ff_4295e9c8_0.fastq

Would love some help with a different approach or correction to my current approach if possible! Thanks.

fastq grep filter • 337 views
ADD COMMENT
3
Entering edit mode
11 weeks ago

it doesn't work because the reads you want to exclude are within 3 lines of the negative selection of reads (-A 3 -v).

you could use

cat PAQ11486_pass_barcode01_52e0f3ff_4295e9c8_0.fastq |\
paste - - - - |\
grep -v -F -f names_pc3.txt |\
tr "\t" "\n"

see also : How to remove a list of reads from fastq file? ; How To Extract A Subset Of Reads In Fastq Using An Id List? ; removal of spesific read from fastq file ; ...

ADD COMMENT
0
Entering edit mode

Thank you so much Pierre, that worked perfectly! Much appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 1690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6