Dear all, I am trying to assemble a phage (virus) genome. I've checked the quality of the reads and mapped them to the host genome to remove the host genome reads. Now I am trying to assemble the unmapped reads but the number of contigs is more than 1000. I have tried different kmer sizes and different assembler but the number of contigs is always more than 1000. I've found that there are chimeric reads due to which the number of contigs is very large. My question is how to find out these chimeric reads and remove them?
1) Have you compared the contigs to host genome? Just because you removed the reads by mapping does not mean that all of host DNA is gone.
2) Have you tried assembling without removing any reads? You may be unintentionally removing some viral reads, and host contigs can always be removed afterwards.
3) Consider using virus-specific assemblers that account for their higher mutation rates.