Question: Calculating coverage of genes which have accounted for multiple best hits from bam/sam files
gravatar for cwwong13
2.9 years ago by
cwwong130 wrote:

I am new to bioinformatics and currently working on a project need to calculate the coverage of each gene from a metagenomic data set. However, there are several questions cannot be solved after I went through many posts from this platform and other websites (including the manuals of samtools, bwa, bedtools)...

My goal is to:

  1. Map metagenomic sequencing data with bwa aln (it is based on one of the references' method) against a custom reference genome
  2. Calculate the coverage of each gene from the reference genome

However, for the second step, I expect that some of the reads will be mapped to multiple regions in the reference genome. Therefore, I would like to ask are there any method to get the coverage that is weighted by the best tied hits? In other words, if a read mapped to 2 regions in total (these two regions have same mapping score), they both contribute 50% when calculating coverage.

I found that there is a X0 flag in the sam output from bwa, and I previously use this to retrieve how many times do that sequence mapped. However, when I was using bedtools, it seems to discard all alternative hits that have the same score (it count only the first region showed in sam file in standard output).

I am not sure whether my work flow is alright, thus it would be great if you could provide some suggestions (any parameters or flags should be used during samtools, bedtools, bwa etc) on my current pipeline or suggest new tools that can accomplish the tasks. Thank you!

ADD COMMENTlink modified 2.9 years ago by Michael Dondrup47k • written 2.9 years ago by cwwong130
gravatar for Michael Dondrup
2.9 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

That is caused BWA which outputs only one of multiple hits with equal scores. It uses the X0 tag to tell you the number of total hits, but I do not know a way to extract all matching positions. You have to use a different aligner, if you need this information in the sam/bam-output.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour