How can I calculate the percent of bases covered in the query assembly relative to the reference assembly?
1
0
Entering edit mode
3 months ago
O.rka ▴ 710

I have a bunch of viruses I've assembled from metagenomic data and I have the host genome as well. I want to see what proportion of each virus aligns to the host genome. I ran minimap2 with the host genome as the reference and can calculate the percent of the genome covered using samtools coverage but how can I do this with the reverse? Would it make sense to use the viral genome as the reference? If not, is there a way to do this with the SAM files I currently have already?

samtools • 573 views
ADD COMMENT
0
Entering edit mode
3 months ago
dthorbur ★ 1.9k

I assume you used the chromosomes/scaffold/contigs of your host against the viral genomes, which I think oversimplifies the results as whole chromosomes are likely not integrated in a single event. I think you would get more appropriate results (depending on your question) using a k-mer based synteny tool. I am only familiar with Satsuma2, but it is a little old so unsure if there are better options available now.

You could use the satsuma_summary.chained.out output file would be easy to parse to identify which regions of the genome are similar and how similar. Some thresholds for inclusion would be sensible to remove putatively spurious hits with low sequence identity.

ADD COMMENT

Login before adding your answer.

Traffic: 1367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6