Entering edit mode
3.3 years ago
biomon
▴
60
Hi,
I would like to subset a fastq file and only extract reads that start with a specific pattern/ nucleotide sequence into new fastq file. Is there a good way I can do this?
Thanks in advance!
seqkit locate
is probably what you're looking for.Thanks, I'll give it a go and play around with it. I had a look at the manual, I can only see examples with fasta input, no fastq. The output examples were also not in fastq. Maybe I have missed something.
Hmm fair point. I don't see anything about
fastq
files either. That's really unfortunate then.Maybe something like this then:
Put that in a file (e.g.,
filename.py
), invoke it withpython3
like so:It should create an
input_filt.fastq
file in the same directory asinput.fastq
that contains all sequences matching the pattern you supplied to it ("^ATGC"
in this example).(You'll need to have
Biopython
installed for this little script to work.)Great, thanks I will have a go!