Im analysing Oxford Nanopore sequenced DNA in human cells, which i align to the hg19 UCSC reference genome. Most of the time i get a high mapping percentage, however in a few cases i get mapping percentages below 5%.
Of course im able to tune the parameteres of minimap2 a bit, but i never get more than 5% mapped sequences. With the basic mapping being done as the following
minimap2 -t 8 -ax map-ont --secondary=no hg19.25chr.mmi xx.fastq | samtools sort - > xx.bam
Then for those cases with low mapping percentages, im extracting some of the unmapped sequences and by using BLAST I find the low mapping percentages are due to viral contamination of my samples.
Do any of you guys know of some better method / database in order to assess what bacteria / virus / other species the unmapped reads are aligning to, instead of just identifying the contaminated species by blasting a random amount of reads and then realigning to their respective ref genome. ?
Thanks for your help.