Quick way to get a field such as QNAME from the last read in a bam file?
1
0
Entering edit mode
6 weeks ago
kalavattam ▴ 80

To get the QNAME field from the last read in a bam file, I do the following:

samtools view "${bam}" | tail -1 | cut -f1

However, this takes quite a long time for larger bam files (for example, I am working with bam files in the range of 40–50 GB). Does anyone know of a faster and perhaps less resource intensive way to do this?

bam samtools • 505 views
ADD COMMENT
2
Entering edit mode
6 weeks ago

get the last chromosome of the bam using samtools idxstat in.bam and then use the index:

samtools view in.bam <NAME_OF_LAST_CHROM>| tail -1

ADD COMMENT
0
Entering edit mode

Thank you. Do you know of any strategies for non-coordinated-sorted bam files?

ADD REPLY
0
Entering edit mode

If these are unsorted files, why would you need the last read? Couldn't you pick a random read or the first one?

ADD REPLY
0
Entering edit mode

Thanks, they are queryname-sorted files.

ADD REPLY
0
Entering edit mode

I wrote a small Python script that uses pysam to write a bam outfile from a bam infile (queryname-sorted) filtered to exclude all reads in a txt list of querynames (reverse queryname-sorted). That is, if a given queryname is in the list, then it is not written to the bam outfile.

how is it related to your original question ?

Quick way to get a field such as QNAME from the last read in a bam file?

furthermore you re-invented https://gatk.broadinstitute.org/hc/en-us/articles/360036882611-FilterSamReads-Picard- + READ_LIST_FILE

ADD REPLY
0
Entering edit mode

Thank you and apologies. To keep the discussion on point, I removed the non-relevant information.

Answering the question from Friederike: I use the queryname of the last read to break a while loop in a script that filters a bam file to exclude reads with querynames that match those in a user-supplied list of querynames.

As Pierre mentioned, this is what picard FilterSamReads READ_LIST_FILE="list-of-querynames.txt" FILTER="excludeReadList" does. Discussion for how and why there's a new implementation, and how it's different, is better suited for a new post.

ADD REPLY

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6