I am interested to see the coverage of a set of genes in my metagenomes (they are unassembled metagenomes).
My plan was to map my raw reads to a database of bacterial gene sequences that I curated (specifically alcohol dehydrogenase aka adhE).
After trimming and removing human-host reads with KneadData, I am using bowtie2 to map my raw reads from the metagenomes to this gene database I've made.
I am trying to determine how many reads map to these genes in my metagenomes (i.e., depth of coverage). I know that samtools depth gives me the number of reads mapped to each position, and I know that samtools coverage gives me how many bases in the reference are covered by reads in my metagenomes.
My question is: if I am interested in how these genes are distributed and their prevalence in my metagenomes, I should use the meandepth column in the output from samtools coverage, correct? Or do I instead sum up the reads by position for each gene from samtools depth, then divide that by the length of the reference gene? I was using the meandepth result from samtools coverage as a metric for gene representation in my metagenomes, but am second guessing myself.
Thank you for your input!
if you use the same filtering and
samtools depth -a
there should be no difference isn't it ? samtools coverage would be faster.