Question: Getting list of chromosome names from indexed BAM file
2
gravatar for Alex Reynolds
5.2 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Given an indexed BAM file, is the following command guaranteed to give the full and correct listing of (UCSC) chromosome names for reads in the BAM file?

$ samtools view -H foo.bam | cut -f2 | grep '^SN:chr' | sed s'/SN://'

I'm trying to think of any gotchas or custom implementations. Are there other tags that could be used for chromosome names in BAM file headers, or other issues I'm not thinking of?

samtools bam • 9.9k views
ADD COMMENTlink modified 5.2 years ago by Devon Ryan91k • written 5.2 years ago by Alex Reynolds28k
10
gravatar for Aaronquinlan
5.2 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

If the file is already indexed, you can use the `idxstats` tool.

    samtools idxstats NA18152.bam | head -n 3
    chr1    247249719    123894    0
    chr2    242951149    97215    0
    chr3    199501827    81334    0

 

Then:

    samtools idxstats NA18152.bam | cut -f 1 | head -3
    chr1
    chr2
    chr3

 

ADD COMMENTlink written 5.2 years ago by Aaronquinlan11k
0
gravatar for Devon Ryan
5.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Users can add their own custom tags, so there's nothing preventing a line like:

@FU    SN:chromosomes are fun

You might just grep for "^@SQ" and then do the "cut -f2" to avoid such issues.

ADD COMMENTlink written 5.2 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 609 users visited in the last hour