Can I get each chromosomes offset ranges in Sorted BAM/SAM file
1
0
Entering edit mode
7.9 years ago
deepak643 • 0

I am looking to get the offset ranges of the each chromosome from the sorted SAM/BAM file. Is it possible?

BAM SAM • 1.8k views
ADD COMMENT
1
Entering edit mode

For indexed BAM files, this is available in the .bai index file. If you take the virtual file offset of the first bin of each chromosome, you should be able to >>16 that and get the file offset to the bin.

Edit: For SAM files, I suppose you could try tabix, but I'm not sure why one would want to do that.

ADD REPLY
0
Entering edit mode

is there any api or samtools command to get the virtual offset from bai file?

ADD REPLY
0
Entering edit mode

I don't think it's meant to be used (so it's not really documented), but presumably the hts_idx_t object from htslib would hold that.

ADD REPLY
0
Entering edit mode

what is "offset ranges " ? the file index (fseek ?) ?

ADD REPLY
0
Entering edit mode

Yes the fseek to location in file. So in a sorted bam, the chromosome records will be from chr1 to chr25 sequentially. So I mean the offset ranges as chr1's records are from line number 1 to line 10. chr2 are from line 11 to line 20 and so on. So is there any way that I can get those offset ranges? Or else even knowing from which line chr2/chr3/.. starts will also help.

ADD REPLY
0
Entering edit mode

what devon said. You just need a BAI index to get some specific reads at a given location.

ADD REPLY
0
Entering edit mode
7.9 years ago
venu 7.1k

From your comment even knowing from which line chr2/chr3/.. starts will also help

I would do something like following

  • Create a file containing all the chromosome names, one per each line, say chr.txt

Using chr.txt, I would do

cat chr.txt | xargs -I {} grep -n {} foo.sam | sed 's/:/ /' | sort -k2 -u > chr_line_number.txt

head chr_line_number.txt

1  chr1
3  chr2
5  chr3
...
ADD COMMENT

Login before adding your answer.

Traffic: 3146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6