Question

Filter nanopore fastq files by start time

1

Entering edit mode

4.8 years ago

sendhelp ▴ 10

Hi,

I have a set of nanopore fastq files with headers in the following format:

@29d30612-b8b4-4ab5-967f-531fc2851541 runid=0a82096db062bcf8041358b407ab5e0aa5de7138 read=23606 ch=186 start_time=2019-03-11T16:16:33Z flow_cell_id= protocol_group_id=190311_ang_1 sample_id=190311_ang_1 TCAGTGTACTTCGTTCAATCGCGTTTTCGTGCGCTTTCAGCAAATCTGCCTCACCTCTCCTCTCCACA

I'd like to filter through the fastq reads and extract only those reads that occured in a given time period (e.g. the first hour). I've tried using poretools --end function but can't work out the correct format of the timestamp to use. Any help would be much appreciated, either with the correct format for poretools or a command that would enable me to do this!

Thanks!

nanopore fastq filter start time • 2.3k views

ADD COMMENT • link updated 4.8 years ago by GenoMax 141k • written 4.8 years ago by sendhelp ▴ 10

score 1 · Answer 1 · 2019-07-17

1

Entering edit mode

4.8 years ago

GenoMax 141k

There are a couple of solutions in this thread: Extract fastq sequences based on date/time (which is in the header)

ADD COMMENT • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Thank you! That's solved it!

ADD REPLY • link 4.7 years ago by sendhelp ▴ 10

score 0 · Answer 2 · 2019-07-17

0

Entering edit mode

4.8 years ago

colindaven 6.3k

You might try a grep and include the following three lines (as FASTQ contains four lines per read.

So if you want data from 1600-1659 try

grep -A3 "2019-03-11T16" x.fastq > test.fastq

Then to see how many reads you filtered out

wc -l x.fastq

wc -l test.fastq

Check the format is ok too

head test.fastq

Not tested, but the approach may still work :-)

ADD COMMENT • link 4.8 years ago by colindaven 6.3k

0

Entering edit mode

I found a python script from the link below that works well so haven't tried this but thank you for the help!

ADD REPLY • link 4.7 years ago by sendhelp ▴ 10