Question: grabbing the beginning and end of viral genome in denovo sequensing
gravatar for ebrahimiet
4.5 years ago by
ebrahimiet40 wrote:

Hi all,

I am performing denovo genome assembly of NDV virus by paired end 150 bb Illumina reads. NDV has Negative-stranded RNA linear genome, about 15 kb in size. When I compare the final contig with NCBI deposited full genome, I see that the beginning (leader/promoter) and end of RNA genome is not present in finall denovo assembled contig. How I can enrich the contigs for beginning and end of viral genome?

many thanks


assembly • 1.0k views
ADD COMMENTlink written 4.5 years ago by ebrahimiet40

RNA-Seq tends to have poor performance in the ends of viral genomes (for a range of reasons). There's nothing you can do to enrich for reads in those regions because they don't exist.

If you want the complete ends you'll have to use RACE or something.

ADD REPLYlink written 4.5 years ago by pld4.9k

@joe: Don't want to hijack this thread but do you have/know of references that show the poor performance of viral RNAseq?

ADD REPLYlink written 4.5 years ago by GenoMax94k

There's not poor performance overall, it works very well, the genomic termini just tend to be a pain in the ass and you usually have to go after them with RACE or something. This may not apply to all types of viruses, it does seem to be the case for (+/-)ssRNA viruses (except maybe Deltaviruses). The explanation I've always gotten is that the typically large amounts of secondary/tertiary structure in these regions leads to issues during cDNA generation.

I've certainly seen it in every +/-ssRNA virus we've sequenced. I can get 10000-150000x coverage inside the genome, but when I get to the 5' or 3' ends the coverage drops rapidly and I'm usually missing the first/last 50-200bp. It does seem to be a function of depth: the deeper the coverage, the more the termini tend to be covered.

There's not much on why, but genome sequences aren't usually considered complete without RACE to obtain the termini.

ADD REPLYlink written 4.5 years ago by pld4.9k

I am currently working with someone trying to define ends of some transcripts for a virus (not at the beginning/end of the genome) and RNAseq data has been partially inconclusive (there is no smoking gun 5'-start though things are better on 3'-end). Viruses are so gene rich that it is difficult to tease transcripts out. I was suspecting that something like RACE may have to be done to nail the starts down since RNAseq alone does not seem to cut it.

Thanks for the papers and your answer.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by GenoMax94k

It doesn't surprise me that the viral transcripts are giving your problems. If your virus will polyA its mRNAs you might have an easier time doing race for the 3' end.

ADD REPLYlink written 4.5 years ago by pld4.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2652 users visited in the last hour