Entering edit mode
4.3 years ago
biomon
▴
60
Hi,
I would like to subset a fastq file and only extract reads that start with a specific pattern/ nucleotide sequence into new fastq file. Is there a good way I can do this?
Thanks in advance!
seqkit locateis probably what you're looking for.Thanks, I'll give it a go and play around with it. I had a look at the manual, I can only see examples with fasta input, no fastq. The output examples were also not in fastq. Maybe I have missed something.
Hmm fair point. I don't see anything about
fastqfiles either. That's really unfortunate then.Maybe something like this then:
Put that in a file (e.g.,
filename.py), invoke it withpython3like so:It should create an
input_filt.fastqfile in the same directory asinput.fastqthat contains all sequences matching the pattern you supplied to it ("^ATGC"in this example).(You'll need to have
Biopythoninstalled for this little script to work.)Great, thanks I will have a go!