How to subset a FASTQ file based on a list of sequences
1
1
Entering edit mode
7.1 years ago
heso ▴ 40

I have a file1.txt (one sequence per line) which I want to use as a reference to filter out entries from a file.fastq. To clarify, I want to keep the file.fastq entries that perfectly match the sequences in file1.txt

I know BBMap has filterbyname.sh for filtering by ID's. Is there some tool that I could use for sequence based filtering? Thanks in advance :)

FASTQ filter • 1.9k views
ADD COMMENT
0
Entering edit mode

Slightly convoluted. You may be able to use BBMap to align against those sequences. Set minid=0.95 and collect sequences that do not map by using outu= in a file. Then filter those out of the original file.

ADD REPLY
0
Entering edit mode

There are lots of ways to code this. Have you tried anything, or have any coding/command line experience?

ADD REPLY
0
Entering edit mode

I do have command line experience. Can you suggest a code?

ADD REPLY
0
Entering edit mode
7.1 years ago
Jake Warner ▴ 830

You can use Bowtie2. Set parameters for as strict an alignment as you like, then use the --al option to write a fastq file for the successful alignments and the --un option to write a fastq file for the unsuccessful alignments.

ADD COMMENT

Login before adding your answer.

Traffic: 3129 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6