I would like to highlight hg19/virus ratio for Illumina paired reads. My first step was to remove hg19 reads (using
-f4 samtools) and then align remaining reads against virus.
I have noticed that using the same parameters I got different results for same run if I process paired reads (50% hg19, 50% virus) or single reads (3% hg19, 97% virus).
bwa mem -k 20 hg19.fasta singlereads.fastq > out.sam bwa mem -k 20 hg19.fasta paired1.fastq paired2.fastq > out2.sam
For paired reads, I have blasted hg19 aligned reads and it occurs that human reads are actually virus.
I have noticed in cigar string that a lot of human mapped are closed to seed 20. Also, if I try to rise seed to 50 (
-k 50) I got an expected ratio (3% hg19, 97% virus).
There is only limited homology between hg19 and the virus and this virus is not integrated to human genome.
Does someone can explain to me why for the same seed I haven't the same results if I give my reads single or paired??