Coverage In Bam File - Bases And Overall Count
2
6
Entering edit mode
9.6 years ago

This might be a bit duplicate question, but previous posts did not help me.

I have a reference sequence, I map reads on it and get .bam file. Now I would like to get number of reads overlapping every single base in my reference sequence (=coverage per base).

Secondly, I would like to get the number of bases at these positions, like:

pos A C G T
1 A:2 C:3 G:0 T:1
2 A:2 C:3 G:0 T:1
3 A:0 C:0 G:0 T:7
4 ..
..

Thanks a lot!

bam coverage • 17k views
ADD COMMENT
0
Entering edit mode

what did you try ?

ADD REPLY
0
Entering edit mode

I have found that samtools mpileup -f ref.fasta aln_sorted.bam does the first part for me.

ADD REPLY
1
Entering edit mode

check my answer here, I think this is what you are looking for.

ADD REPLY
0
Entering edit mode

I have had problem installing Bio::DB::Sam, so let me write my solution here in case someone is interested; First, recompile samtools with CSFLAG -fPIC, as suggested here: http://cpansearch.perl.org/src/LDS/Bio-SamTools-1.36/README This way you get those libbam.a and bam.h files. Then use just perl -MCPAN -e 'install Bio::DB::Sam' and as a path write way to newly compiled samtools. Enjoy library :-)

ADD REPLY
0
Entering edit mode

to do the first part, samtools depth is a direct way to get each position's coverage. much faster than processing the output of samtools mpileup. to do the second part, I would also go for samtools mpileup as mentioned here. if both things are needed I would then try to get all the information from a single samtools mpileup output parsing.

ADD REPLY
6
Entering edit mode
9.6 years ago
Farhat ★ 2.9k

Both the coverage per base and the number of a particular base at each position can be obtained using samtools mpileup in.bam. This results in a file something like this:

Contig18263     110     N       1       T       e
Contig18263     111     N       1       C       c
Contig18263     112     N       1       T       e
Contig18263     113     N       1       T       d
Contig18263     114     N       1       T$      f
Contig18263     132     N       3       ^$T^*T^$T       gfh
Contig18263     133     N       5       TTT^$T^$T       hhhhf
Contig18263     134     N       7       AAAAA^!A^!A     hhhhfeg
Contig18263     135     N       7       TTTTTTT hhhhfgf
Contig18263     136     N       7       GGGGGGG hhhhfgg
Contig18263     137     N       7       TTTTTTT hghhfhh

The 4th column gives you the per base coverage. Processing the 5th column can give you the number of bases at each position with something like (in Python) Not the best code but you get the idea:

for line in in.pileup:
    linsplit=line.split()
    print linsplit[0], linsplit[1], "A:"+str(linsplit[4].count('A')+linsplit[4].count('a')), "C:"+str(linsplit[4].count('C')+linsplit[4].count('c')), "G:"+str(linsplit[4].count('G')+linsplit[4].count('g')), "T:"+str(linsplit[4].count('T')+linsplit[4].count('t'))
ADD COMMENT
0
Entering edit mode

Thank you for your help.

ADD REPLY
0
Entering edit mode

One more question:) Now I use samtools mpileup -f reference.fasta -r contig aln.bam >mpileup command. However, results not always start from 1st base (which is desired behaviour in my case). Please, how could I fix this? Right now, I do not see any logic in where it starts reporting.

ADD REPLY
1
Entering edit mode

It will start reporting from the position where there are alignments. If there are no reads aligned to the first position, it will not report them.

ADD REPLY
0
Entering edit mode

Well, that what I would expect, but unfortunately there are reads on those first positions (I can see them in .bam file). I guess I will try to contact authors of samtools.

ADD REPLY
1
Entering edit mode
6.1 years ago
Srihari ▴ 30

I find bam-readcount to be very useful here - it directly works with the bam file (no need for mpileup). Use it like so:

bam-readcount -f reference.fa file.bam

What is also nice is that bam-readcount can take in a file with only the positions/regions you're interested in (eg -l file.bed) and spit out the split of bases/insertions/deletions at each position.

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6