Question: Assembling a single transcript sequence from RNA-seq data
3.4 years ago by
United States
apt.university70 wrote:

I am looking for suggestions on how to assemble a single, fairly complex transcript sequence from RNA-Seq data. The protein this transcript encodes has a variable number of repeated 10 a.a. domains. Assembling with Trinity or SOAPdenove-trans did not generate a complete sequence for the proteins -- the protein does not other domains found in known orthologs.  

I also tried aligning reads against orthologs (used usearch) and I assembled those reads that aligned using CAP3 and Velvet. That approach did actually worse than Trinity. 

Any suggestion on how to accurately assemble that single sequence?



rna-seq assembly • 1.1k views
ADD COMMENTlink written 3.4 years ago by apt.university70

You might try Spades or Tadpole (in the BBMap package); both of them handle highly-variable coverage better than most isolate assemblers.  However, if it is differentially spliced, that result will still probably not be great.

ADD REPLYlink written 3.4 years ago by Brian Bushnell16k

Thanks for the suggestion, Brian. Once I pulled out reads that aligned against the homologue sequence (as well as their non-aligned pairs), I experimented with a plethora of assemblers, including SPAdes. None of them yielded a complete transcript. 

I wonder whether there are tools that use a greedy approach to iteratively add reads to both ends of a partial transcript. Perhaps that would be intractable for a large datasets but maybe it is a viable solution for single, or even a few transcripts. Would such an approach work or would it be overwhelmed by the complexity of assembly? I wonder....



ADD REPLYlink written 3.4 years ago by apt.university70
