Question

Assembling a single transcript sequence from RNA-seq data

0

Entering edit mode

9.6 years ago

apt.university ▴ 70

I am looking for suggestions on how to assemble a single, fairly complex transcript sequence from RNA-Seq data. The protein this transcript encodes has a variable number of repeated 10 a.a. domains. Assembling with Trinity or SOAPdenove-trans did not generate a complete sequence for the proteins -- the protein does not other domains found in known orthologs.

I also tried aligning reads against orthologs (used usearch) and I assembled those reads that aligned using CAP3 and Velvet. That approach did actually worse than Trinity.

Any suggestion on how to accurately assemble that single sequence?

Thanks

RNA-Seq Assembly • 2.3k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.6 years ago by apt.university ▴ 70

0

Entering edit mode

You might try Spades or Tadpole (in the BBMap package); both of them handle highly-variable coverage better than most isolate assemblers. However, if it is differentially spliced, that result will still probably not be great.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for the suggestion, Brian. Once I pulled out reads that aligned against the homologue sequence (as well as their non-aligned pairs), I experimented with a plethora of assemblers, including SPAdes. None of them yielded a complete transcript.

I wonder whether there are tools that use a greedy approach to iteratively add reads to both ends of a partial transcript. Perhaps that would be intractable for a large datasets but maybe it is a viable solution for single, or even a few transcripts. Would such an approach work or would it be overwhelmed by the complexity of assembly? I wonder....

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by apt.university ▴ 70