Question

de novo hybrid assembly of transcriptome - Illumina PE and PacBio Full-length reads

1

Entering edit mode

9.8 years ago

Bade ▴ 40

Hi All,

I did a transcriptome assembly of Illumina SE and PE reads using Trinity, but the N50 values for my assembly are very low. Here is the summary:

Total trinity 'genes':    748144
Total trinity transcripts:    916206
Percent GC: 43.83

Contig N10: 1765
Contig N20: 1064
Contig N30: 677
Contig N40: 477
Contig N50: 369

Median contig length: 253
Average contig: 364.39
Total assembled bases: 333854406

In addition to Illumina PE and SE data, I also have PacBio full-length transcripts data. Is there any way I can perform a hybrid transcriptome assembly? Need your ideas on it.

PBcR with Celera assembler seems to be one option that corrects PacBio reads and performs hybrid assembly, but 1) it seems to be significantly slow as mentioned by authors, 2) it is not clear from its documentation whether it will use Illumina reads beyond the reads correction step i.e. for final assembly and 3) no example provided by authors on how to use it for 'transcriptome' hybrid assembly. These issues with PBcR makes me think that it is a slow error correction tool which uses only the PacBio reads for final assembly. Are there any other options for de novo hybrid assembly of transcriptome?

Thanks
Bade

RNA-Seq Assembly Illumina PacBio • 3.5k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.8 years ago by Bade ▴ 40

score 3 · Accepted Answer · 2015-10-01

If your PacBio reads are already corrected, you can use them with Trinity (you don't mention if they are or not corrected). Quoting from this post from Trinity mailing list:

"You can incorporate corrected pacbio reads into Trinity using the --long_reads parameter."

rnaSPAdes is based on SPAdes, so it should be able to take both types of data, and MIRA as well, but both of them will probably choke if the transcriptome is too large or complex.

If your PacBio are not corrected, correct them and try one of the above suggestions.