Getting list of chromosome names from indexed BAM file
2
4
Entering edit mode
8.1 years ago

Given an indexed BAM file, is the following command guaranteed to give the full and correct listing of (UCSC) chromosome names for reads in the BAM file?

$ samtools view -H foo.bam | cut -f2 | grep '^SN:chr' | sed s'/SN://'

I'm trying to think of any gotchas or custom implementations. Are there other tags that could be used for chromosome names in BAM file headers, or other issues I'm not thinking of?

bam samtools • 19k views
ADD COMMENT
14
Entering edit mode
8.1 years ago

If the file is already indexed, you can use the idxstats tool.

samtools idxstats NA18152.bam | head -n 3
chr1    247249719    123894    0
chr2    242951149    97215    0
chr3    199501827    81334    0

Then:

samtools idxstats NA18152.bam | cut -f 1 | head -3
chr1
chr2
chr3
ADD COMMENT
1
Entering edit mode
8.1 years ago

Users can add their own custom tags, so there's nothing preventing a line like:

@FU    SN:chromosomes are fun

You might just grep for "^@SQ" and then do the cut -f2 to avoid such issues.

ADD COMMENT

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6