I have a viral genomics question for you! I am analyzing an RNA-Seq library comprised of pooled RNA samples from bumble bees across the United States, in order to quantify the diversity of viruses that infect these populations. I assembled the RNA-Seq reads into contigs using the de novo assembly option in CLC workbench, and after searching for viral contigs using BLAST, found one novel virus candidate. From the BLAST search, I know that the candidate is closely related to a mosquito virus family. Based on an alignment and search in the NCBI conserved protein domain database, I roughly know what size and what protein families the virus is likely to have. However, because it is a new virus, I have no reference to check if I've obtained the full genome. After aligning it with its close relatives, the novel virus contig is roughly a third of the size of the other related viral genomes, indicating that this contig probably does not represent the complete genome.
To solve this issue, I figure that I need to redo the assembly with a pipeline that is more sensitive to recovering viral genomes as opposed to CLC workbench. However, I'm not sure what the best way to proceed is. Is there a particularly good de novo assembler for obtaining complete viral genomes?
I would like to know if there is a way to obtain the complete genome of the novel virus from the RNA-Seq reads given that I approximately know its size, its close relatives, and what conserved proteins it should have given my phylogenetic analysis is correct? Is there a way I can use a close relative to map the reads onto, despite not having an exact reference to use?
Any suggestions on how to proceed would be greatly appreciated! Thank you in advance, Brianna