Question: How to obtain a novel virus' complete genome from de novo assembly
gravatar for brianna.flynn
7 months ago by
brianna.flynn10 wrote:

Hello all,

I have a viral genomics question for you! I am analyzing an RNA-Seq library comprised of pooled RNA samples from bumble bees across the United States, in order to quantify the diversity of viruses that infect these populations. I assembled the RNA-Seq reads into contigs using the de novo assembly option in CLC workbench, and after searching for viral contigs using BLAST, found one novel virus candidate. From the BLAST search, I know that the candidate is closely related to a mosquito virus family. Based on an alignment and search in the NCBI conserved protein domain database, I roughly know what size and what protein families the virus is likely to have. However, because it is a new virus, I have no reference to check if I've obtained the full genome. After aligning it with its close relatives, the novel virus contig is roughly a third of the size of the other related viral genomes, indicating that this contig probably does not represent the complete genome.

To solve this issue, I figure that I need to redo the assembly with a pipeline that is more sensitive to recovering viral genomes as opposed to CLC workbench. However, I'm not sure what the best way to proceed is. Is there a particularly good de novo assembler for obtaining complete viral genomes?

I would like to know if there is a way to obtain the complete genome of the novel virus from the RNA-Seq reads given that I approximately know its size, its close relatives, and what conserved proteins it should have given my phylogenetic analysis is correct? Is there a way I can use a close relative to map the reads onto, despite not having an exact reference to use?

Any suggestions on how to proceed would be greatly appreciated! Thank you in advance, Brianna

virus rna-seq assembly • 316 views
ADD COMMENTlink modified 7 months ago by Mensur Dlakic6.0k • written 7 months ago by brianna.flynn10

What all is expected to be in the sample you sequenced? Bee RNA + RNA viruses + ?

ADD REPLYlink written 7 months ago by genomax87k

Yes, we expect to find bee host RNA, plant RNA and RNA viruses (typically from pollen, though some plant viruses do infect bee hosts), and insect specific RNA viruses.

ADD REPLYlink written 7 months ago by brianna.flynn10
gravatar for Mensur Dlakic
7 months ago by
Mensur Dlakic6.0k
Mensur Dlakic6.0k wrote:

How sure are you that there are no other viral contigs in the existing assembly? If you do 4-mer or 5-mer frequency-based embedding (t-SNE, UMAP), viral contigs are usually easy to spot on the outside even without BLASTing.

Some of your questions are answered in this thread. If you expect to have certain proteins, PLASS may help your assembly.

ADD COMMENTlink written 7 months ago by Mensur Dlakic6.0k

Hey Mensur, I didn't clarify this in the parent post but we did find other viral contigs (known bee, insect and plant viruses ) - the one viral contig I refer to the most is the one that we think is a new species. Thank you for the suggestions! I'll look into using PLASS

ADD REPLYlink written 7 months ago by brianna.flynn10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 734 users visited in the last hour