Question: Can Rnaseq Reads Be Used For Genome Assembly
6.6 years ago by
United States
Lhl730 wrote:

Hi There,

I have been doing genome assembly with one pair-end library (insert length = 500) for a non-model plant species using Velvet. The assembly is very fragmented.

Since i also have pair-end RNAseq sequencing data, I am wondering if i can use RNAseq data to improve genome assembly.

My questions are: (1) Does it make sense to map transcriptome assembly to genome assembly and join genome scaffolds/contigs spanned by the same transcripts?

(2) can I try using RNAseq reads data for the same purpose? I mean using RNAseq reads as genomic sequence reads to do de novo assembly. Considering RNAseq PE reads are from alternatively spliced transcripts, I will use them as single-end reads when doing genome assembly.

I will appreciate it very much if someone can point out whether these ideas are reasonable or not or give me additional suggestions.

Kind Regards,


ADD COMMENTlink modified 2.6 years ago by balaji40 • written 6.6 years ago by Lhl730

Highly fragmented assemblies with velvet are not uncommon. Before I can advise furhter: 1) Is this illumina sequencing? Is it MiSeq by any chance? What is the coverage? 2) Have you tried Velvet Optimiser?

And to answer your question directly, I wouldn't use RNASeq reads for genome assembly. Think about it, if it's a eukaryote, it could have introns. Even if it's not, we are talking about gene duplications and repetitive regions...

written 6.6 years ago by Adrian Pelin2.4k

Thanks akoik063.

It is Illumina-highseq sequencing. About the coverage, Velvet produced the following message (k=57) 'Median coverage depth = 4.293333 Final graph has 5301633 nodes and n50 of 1063, max 70820, total 613220706, using 78039534/136079432 reads'. I mapped reads to the assembly and did some calculations and got average coverage == 45. I have NOT tried Optimiser, i simply tried multiple k-mers and found k=57 gave largest N50.

written 6.6 years ago by Lhl730
2.6 years ago by
balaji40 wrote:

Came across some more tools (some included from above)

AGOUTI: improving genome assembly and annotation using transcriptome data

Rascaf: Improving Genome Assembly with RNA Sequencing Data

PEP_scaffolder: using (homologous) proteins to scaffold genomes

SCUBAT (Scaffolding Contigs Using BLAT And Transcripts)

L_RNA_scaffolder: scaffolding genomes with transcripts

written 2.6 years ago by balaji40
6.6 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

This is NOT an evidence-based answer and represents only intuition, so I hope someone else has more insight. You propose an interesting idea, but I suspect that you are better off doing further genomic sequencing (with potentially different library prep or even technology). The RNA-seq single-end reads may, themselves, be spliced, making including them problematic. Including the RNA-seq as paired-end is also difficult since the insert size distribution is not well-understood given splicing.

written 6.6 years ago by Sean Davis26k

Thanks Sean. You are right. That's why i said i will use RNAseq PE reads as SE reads because relative position in RNAseq will be different from those in DNAseq. But still we also need to consider intron/extron structure issues uniq to RNAseq as mentioned by akoik063. Cheers

written 6.6 years ago by Lhl730
6.6 years ago by
Damian Kao15k
Damian Kao15k wrote:

These two tools are supposed to perform this:



edit* I misread, you want to use RNA-seq reads. These tools used assembled transcripts to attempt to scaffold.

modified 6.6 years ago • written 6.6 years ago by Damian Kao15k


I too have the same question.Actually I am looking for a denovo assembly tool that can assemble a meta-transcriptome data(paired-end sequence(insert size=300) **consists of mixed sequence reads of multiple species in a microbial community.

And in search of that I came across a METAVELVET,a de novo metagenome assembly and I am not sure whether this works well with my data?

Any suggestions please.

written 6.6 years ago by bambus072550

Thanks Damian, I will give L-RNA_Scaffolder a go.

written 6.6 years ago by Lhl730

How did L-RNA_Scaffolder performed for you?

written 3.8 years ago by Ric290
6.6 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

I would be hesitant to use RNA-Seq reads for this purpose - genome assembly - because you cannot be certain that the reads are contiguous with respect to a complete genome. I would be much less hesitant to throw into the assembly process RNA-Seq reads from genes that are expressed as a single exon.

You noted that you're working with a non-model plant genome, but can you align reads to a completed plant genome? Not all available plant genomes are for model species. This may allow you to order reads more efficiently and with greater confidence than with the RNA-Seq reads. Or, it may give you one assembly of the genome that you can use. The RNA-Seq assembly can be another. My point is I found synteny very powerful when scaling from Arabidopsis to soybean.

written 6.6 years ago by Larry_Parnell16k

Hi Larry, Thanks for your suggestion. However, i am not very sure that i understand you clearly. By 'My point is I found synteny very powerful when scaling from Arabidopsis to soybean', do you mean they show consistent/conserved synteny?

written 6.6 years ago by Lhl730
3.8 years ago by
Ric290 wrote:

Hi, I found the following tools: * *

Or does any one know a better tools?


written 3.8 years ago by Ric290
