Question: Coverage In Bam File - Bases And Overall Count
5
gravatar for Biomonika (Noolean)
6.5 years ago by
State College, PA, USA
Biomonika (Noolean)3.0k wrote:

This might be a bit duplicate question, but previous posts did not help me.

I have a reference sequence, I map reads on it and get .bam file. Now I would like to get number of reads overlapping every single base in my reference sequence (=coverage per base).

Secondly, I would like to get the number of bases at these positions, like:

pos A C G T
1 A:2 C:3 G:0 T:1
2 A:2 C:3 G:0 T:1
3 A:0 C:0 G:0 T:7
4 ..
..

Thanks a lot!

bam coverage • 11k views
ADD COMMENTlink modified 2.9 years ago by Srihari30 • written 6.5 years ago by Biomonika (Noolean)3.0k

what did you try ?

ADD REPLYlink written 6.5 years ago by Pierre Lindenbaum118k

I have found that samtools mpileup -f ref.fasta aln_sorted.bam does the first part for me.

ADD REPLYlink written 6.5 years ago by Biomonika (Noolean)3.0k
1

check my answer here, I think this is what you are looking for.

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Arun2.3k

I have had problem installing Bio::DB::Sam, so let me write my solution here in case someone is interested; First, recompile samtools with CSFLAG -fPIC, as suggested here: http://cpansearch.perl.org/src/LDS/Bio-SamTools-1.36/README This way you get those libbam.a and bam.h files. Then use just perl -MCPAN -e 'install Bio::DB::Sam' and as a path write way to newly compiled samtools. Enjoy library :-)

ADD REPLYlink written 6.4 years ago by Biomonika (Noolean)3.0k

to do the first part, samtools depth is a direct way to get each position's coverage. much faster than processing the output of samtools mpileup. to do the second part, I would also go for samtools mpileup as mentioned here. if both things are needed I would then try to get all the information from a single samtools mpileup output parsing.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Jorge Amigo11k
5
gravatar for Farhat
6.5 years ago by
Farhat2.9k
Pune, India
Farhat2.9k wrote:

Both the coverage per base and the number of a particular base at each position can be obtained using samtools mpileup in.bam. This results in a file something like this:

Contig18263     110     N       1       T       e
Contig18263     111     N       1       C       c
Contig18263     112     N       1       T       e
Contig18263     113     N       1       T       d
Contig18263     114     N       1       T$      f
Contig18263     132     N       3       ^$T^*T^$T       gfh
Contig18263     133     N       5       TTT^$T^$T       hhhhf
Contig18263     134     N       7       AAAAA^!A^!A     hhhhfeg
Contig18263     135     N       7       TTTTTTT hhhhfgf
Contig18263     136     N       7       GGGGGGG hhhhfgg
Contig18263     137     N       7       TTTTTTT hghhfhh

The 4th column gives you the per base coverage. Processing the 5th column can give you the number of bases at each position with something like (in Python) Not the best code but you get the idea:

for line in in.pileup:
    linsplit=line.split()
    print linsplit[0], linsplit[1], "A:"+str(linsplit[4].count('A')+linsplit[4].count('a')), "C:"+str(linsplit[4].count('C')+linsplit[4].count('c')), "G:"+str(linsplit[4].count('G')+linsplit[4].count('g')), "T:"+str(linsplit[4].count('T')+linsplit[4].count('t'))
ADD COMMENTlink written 6.5 years ago by Farhat2.9k

Thank you for your help.

ADD REPLYlink written 6.5 years ago by Biomonika (Noolean)3.0k

One more question:) Now I use samtools mpileup -f reference.fasta -r contig aln.bam >mpileup command. However, results not always start from 1st base (which is desired behaviour in my case). Please, how could I fix this? Right now, I do not see any logic in where it starts reporting.

ADD REPLYlink written 6.4 years ago by Biomonika (Noolean)3.0k
1

It will start reporting from the position where there are alignments. If there are no reads aligned to the first position, it will not report them.

ADD REPLYlink written 6.4 years ago by Farhat2.9k

Well, that what I would expect, but unfortunately there are reads on those first positions (I can see them in .bam file). I guess I will try to contact authors of samtools.

ADD REPLYlink written 6.4 years ago by Biomonika (Noolean)3.0k
1
gravatar for Srihari
2.9 years ago by
Srihari30
Srihari30 wrote:

I find bam-readcount to be very useful here - it directly works with the bam file (no need for mpileup). Use it like so:

bam-readcount -f reference.fa file.bam

What is also nice is that bam-readcount can take in a file with only the positions/regions you're interested in (eg -l file.bed) and spit out the split of bases/insertions/deletions at each position.

ADD COMMENTlink written 2.9 years ago by Srihari30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1994 users visited in the last hour