Question: How to count the total aligned bases in BAM?
0
gravatar for vw
18 months ago by
vw20
vw20 wrote:

A quick question: I want to count the total aligned bases in a BAM file. I try to apply pysam (samtools) and pybedtools (bedtools) to do that. How can I make it? Thanks.

EDIT: I have solved this question with pybedtools and pandas:

 genomecov_base = bd.BedTool.genome_coverage(bam, dz=True) ## Depth per base on genome    

 gencov_perbase = pan.read_table(genomecov_base.fn, names=['chrom', 'position', 'depth'])
 gencov_allbase = (gencov_perbase.iloc[:, 2]).sum()    ## Total bases mapped on genome
pybedtools pysam bam python • 2.0k views
ADD COMMENTlink modified 18 months ago • written 18 months ago by vw20

Hi, this sounds like something I need as well. Does this count the actual number of reads that contain a particular base or just how many reads aligned at that position? For example at position 1 if I know I have 100 reads aligned there, would it tell me I have 50 A's, 25 a, 25 C? Something like that? Or does it just say that there are 100 bases aligned there? Cheers :)

ADD REPLYlink written 11 months ago by DNAngel40

Hi DNAngel, it does not return the per base output, but one option I would recommend for such a calculation is sambamba depth. This will return the per base output at each, although the files become quite large.

ADD REPLYlink modified 10 months ago • written 10 months ago by drkennetz440
5
gravatar for drkennetz
18 months ago by
drkennetz440
drkennetz440 wrote:

Hi, if it is available you could do a quick bedtools and awk calculation:

bedtools genomecov -ibam input.bam -bga > output_aln.bam
awk '{s+=$4}END{print s}' output_aln.bam

this will output the total of reads at each base from your reference genome in your bam file, and then awk will just some the number of reads at each base (giving you the total aligned bases).

I hope this helps!

ADD COMMENTlink modified 18 months ago • written 18 months ago by drkennetz440

Thanks for your code. However, the parameter -a cannot be recognized. I cannot find it in the list also.

ADD REPLYlink written 18 months ago by vw20

I solved this question based on your suggestion. I have re-edited my question. Thank you!

ADD REPLYlink written 18 months ago by vw20

No problem! Glad to help. The -a flag means output all positions (even those with 0 coverage) which may or may not be useful downstream. You don't actually have to include it if you just want the aligned bases coverage, but sometimes it is useful if you want regions with 0 coverage too, in case you have a targeted region and you are trying to see how well it is covered.

ADD REPLYlink written 18 months ago by drkennetz440
1

Hi drkennetz, I tried your code again. I found that genomecov cannot work with -a(at least on my laptop). But coverage does. In order to obtain the depth of all positions, I tried -d or -dz with genomecov. They work fine.

ADD REPLYlink written 18 months ago by vw20

This was a mistake on my part! it should be:

bedtools genomecov -ibam input.bam -bga > output_aln.bam

the -a flag is for bedtools coverage, you are correct! The bga flag means bed graph all so it reports positions with 0 coverage as well. If you do not want 0 coverage positions you can use the flag -bg instead. I edited the answer to reflect the change. Thanks for catching this!

ADD REPLYlink written 18 months ago by drkennetz440

If the question is answered, please consider to mark drkennetz response as accepted answer to help others that have the same question in the future.

ADD REPLYlink written 18 months ago by ATpoint26k

This will sum the reads per base, while the question was about the number of aligned bases (i.e. each covered base should be counted once, regardless of how many reads aligns over it). In this case, just sum the length of the aligned contig (also, using just -bg should speed up the process):

bedtools genomecov -ibam input.bam -bg > output_aln.bg
awk '{ s+=($3-$2) } END { print s }' output_aln.bg
ADD REPLYlink written 5 weeks ago by benpo0
1
gravatar for finswimmer
18 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello vw,

beside other metrics you can get this value with CollectAlignmentSummaryMetrics from picard tools.

fin swimmer

ADD COMMENTlink written 18 months ago by finswimmer13k

Thanks for your help.

ADD REPLYlink written 18 months ago by vw20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour