I’m not sure if this is the appropriate term but the only way I can think of doing this is converting a bam file to bed file then, making an array of length N where N is the size of the genome, then adding up all the positions, then getting the ratio of nonzero events. Sounds very memory intensive so I’m wondering if there’s a better way.
I have the following files:
- BAM files of reads mapped to a metagenome of contigs from different metagenome-assembled genomes (MAG)
- A table of identifiers [id_contig]<tab>[id_mag]
- A fasta file with all of the contigs
I see that there is samtools coverage but I don't how to get coverage for only certain contigs in the bam file. I also found bedtools genomeCov but it's a little confusing how I can adapt my data.
What I'm ultimately looking for is the following table:
[mag_1] [mag_2] ... [mag_m] [bam_file_1] [bam_file_2] ... [bam_file_n]
Where each value the matrix has the percent of genome covered by reads in the bam file.