Sub-sampleing bam files based on sequencing_summary.txt (guppy output)
Entering edit mode
10 weeks ago
aj • 0


I am working with Nanopore sequencing data that was base called and demultiplexed with guppy (version unknown, but likely rather recent). Alignment was done using bwa tool.

Now I want to sub-sample an existing bam so that I end up with a new bam file which only contains reads that were sequenced up hour 5 into the sequencing process. I did so by first extracting all read_id values that full fill this criteria.

awk '$10 <= 1800 {print $5}' 20230503_1351_MC-112161_AIB461_29a045a6.sequencing_summary.txt  > f31583_5h_parent_ID_5h.txt

Then I use the resulting file as input for samtools view.

samtools view -b -N f31583_5h_parent_ID_5h.txt REF_aln_trim.bam > sub_sampled.bam

However, after a closer investigation I discovered that not all qname values in REF_aln_trim.bam have a matching value in 20230503_1351_MC-112161_AIB461_29a045a6.sequencing_summary.txt. This is counter intuitive for me as I assumed that xxx.sequencing_summary.txt contains information about every read that was produced during sequencing. Or is the read_id column in xxx.sequencing_summary.txt not equivalent to the qname tag in a bam file. I would appreciate any input or suggestions how to do the aimed sub-sampling better.


samtools guppy • 354 views
Entering edit mode

You could use one of the solutions here to filter reads with specific time stamps: Extract fastq sequences based on date/time (which is in the header)

Entering edit mode

Thanks so much for your fast reply. Do you know why it would work better sampling from fastq files? Base on the rest of my pipeline it would be much more convenient to sample from the bam files. Is there a bullet proof way to do it?

Entering edit mode

I see that you are retrieving read_id. I am not totally sure how it is related to parent_read_id in column 4 in sequencing summary file. It is possible that qnames in your BAM are actually parent_read_id. I have asked another mod who is ONT expert to comment.


Login before adding your answer.

Traffic: 1233 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6