I am trying to find viruses in next generation RNA seq data. However, I would like to find viruses that are not necessarily known to science... They don't have to be completely new, just different strains or sequences that might not exactly match to anything in the sequencing database. Does anyone have any advice on the best way to go about looking for these viruses? I was thinking about blasting EST databases and using PHIblast. The logic being that there are viral sequences in EST libraries of viruses that are not annotated. Any feedback on this, or PHIblast or (BLAST) settings or parameters that would be ideal would be greatly appreciated!! Thanks for your time
Maybe start by aligning the reads to the transcriptome, then pull out all the reads that don't align, and figure out what they are. If you have enough copies, de novo assembly will give you novel sequences, or you can blast or align to a database of know viruses, and work from that.