Get Bases And Quality From A Bam File [C++]
2
0
Entering edit mode
12.0 years ago
TwoFaces ▴ 10

I need to analyse a bam file with bamtools for C/C++. Can anyone help me reading the bases (A or C or G or T ) of the reads from a bam file? I also need to know the quality of bases (base quality) and reads (mapping quality), but unfortunately I have no idea how to do it.

At the moment i'm only able to open the bam file:

# include "sam.h"

int main (int argc, char *argv[])
{
   bamFile bam_sorted_file;
   if (argc == 2 )  
   {
       bam_sorted_file = bam_open(argv[1], "rb");
       if ( bam_sorted_file==0 )
       {
           cout << "Fail to open BAM file " << argv[1] << endl;
           return 1;
        }
        bam_close(bam_sorted_file);
    }
    return 0;
}

thanks in advance.

read bam • 6.7k views
ADD COMMENT
7
Entering edit mode
12.0 years ago

here is my solution:

/* prints bases and quals */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <bam.h>
#define WHERE fprintf(stderr,"%d\n",__LINE__)
int main(int argc,char** argv)
    {
    bam1_t* b=bam_init1();
    bamFile in= bam_open(argv[1], "r");
    bam_header_t *header;
        if(in==NULL) return -1;
        if(b==NULL) return -1;
        header = bam_header_read(in);
        while(bam_read1(in,b) >= 0)
            {
            int i;
            const bam1_core_t *c = &b->core;
            uint8_t *s = bam1_seq(b), *t = bam1_qual(b);

            fwrite(bam1_qname(b), c->l_qname-1, sizeof(char),stdout); 
            fputc('\t',stdout);
            for (i = 0; i < c->l_qseq; ++i) fputc(bam_nt16_rev_table[bam1_seqi(s, i)], stdout);
            fputc('\t',stdout);
            if (t[0] == 0xff)
                {
                fputs("*",stdout);
                }
            else
                {
                for (i = 0; i < c->l_qseq; ++i) fputc(t[i] + 33,stdout);
                }
            fputc('\n',stdout);
            }
        bam_header_destroy(header);
        bam_close(in);
        bam_destroy1(b);
        return 0;
    }

compile:

 gcc program.c -I /path/tp/samtools -L /path/to/samtools -lbam -lz

test:

$ ./a.out examples/toy.bam
r001    TTAGATAAAGAGGATACTG *
r002    AAAAGATAAGGGATAAA   *
r003    AGCTAA  *
r004    ATAGCTCTCAGC    *
r003    TAGGC   *
r001    CAGCGCCAT   *
x1  AGGTTTTATAAAACAAATAA    ????????????????????
x2  GGTTTTATAAAACAAATAATT   ?????????????????????
x3  TTATAAAACAAATAATTAAGTCTACA  ??????????????????????????
x4  CAAATAATTAAGTCTACAGAGCAAC   ?????????????????????????
x5  AATAATTAAGTCTACAGAGCAACT    ????????????????????????
x6  TAATTAAGTCTACAGAGCAACTA ???????????????????????
ADD COMMENT
0
Entering edit mode

thanks! I'm analyzing the code, trying to understand the functions. What is the hexadecimal code 0xff for t[0] ( if t [0] == 0xff ) and consequently the "*" ? and in either case, what is 33 showing in t [i] + 33?

thanks!!

ADD REPLY
0
Entering edit mode

another question: watching the type "bam1_core_t" at http://samtools.sourceforge.net/samtools/bam/index.html?Functions/Functions.html#//apple_ref/c/func/bam_write1, there is the explanation of the "strand" field, which if I understand it indicates if the filament is forward or backward ... but is not implemented in the structure bam1_core_t

ADD REPLY
0
Entering edit mode

if t [0] == 0xff = no data available , i+33 = quality is shifted with 33 to make it as a printable as an ASCII character.

ADD REPLY
3
Entering edit mode
12.0 years ago

You might take a look at using Bamtools, a C++ API for working with sam/bam files:

https://github.com/pezmaster31/bamtools

ADD COMMENT

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6