I am new to bioinformatics and currently working on a project need to calculate the coverage of each gene from a metagenomic data set. However, there are several questions cannot be solved after I went through many posts from this platform and other websites (including the manuals of samtools, bwa, bedtools)...
My goal is to:
- Map metagenomic sequencing data with bwa aln (it is based on one of the references' method) against a custom reference genome
- Calculate the coverage of each gene from the reference genome
However, for the second step, I expect that some of the reads will be mapped to multiple regions in the reference genome. Therefore, I would like to ask are there any method to get the coverage that is weighted by the best tied hits? In other words, if a read mapped to 2 regions in total (these two regions have same mapping score), they both contribute 50% when calculating coverage.
I found that there is a X0 flag in the sam output from bwa, and I previously use this to retrieve how many times do that sequence mapped. However, when I was using bedtools, it seems to discard all alternative hits that have the same score (it count only the first region showed in sam file in standard output).
I am not sure whether my work flow is alright, thus it would be great if you could provide some suggestions (any parameters or flags should be used during samtools, bedtools, bwa etc) on my current pipeline or suggest new tools that can accomplish the tasks. Thank you!