Question: Abi Solid Transcriptome Assembly
gravatar for Damian Kao
7.4 years ago by
Damian Kao15k
Damian Kao15k wrote:

I have around 1.5 billion RNA-Seq SOLiD3 SE 50bp reads taken from various conditions. Libraries were poly-A enriched and ribo depleted. I've done several different mapping/assemblies with tophat + cufflinks.

-mapped/assembled individual libraries

-mapped/assembled grouped libraries based on biological condition. So all Control libraries together, all irradiation libraries together...etc

-mapped/assembled libraries randomly in increasing number of reads. 100mil, 200mil, 300mil...

-mapped/assembled everything together

I decided to go with mapping/assembly based on grouped biological condition because I figured different conditions will produce various transcript compositions. Merging different compositions might confound the assembler statistics. I used cuffmerge to merge all the assemblies together.

I looked at the splice junctions generated from Tophat in each of these libraries and found that the number of discovered splice junctions actually caps out at around half a billion reads. This makes me think that it is reaching information saturation at around half a billion. Interestingly, I find my assembly getting progressively worse (fragmentation of transcripts) as I add more reads.

My main problem with the reference assembly is accuracy of assembly and coverage. I find a lot of assembled transcripts to have ORFs with stops in the middle of the transcript. This might be due to tophat not predicting splice junctions correctly? Or perhaps a problem with the genome which is AT rich and in around 25k supercontigs.

I also have around 1 million roche 454 reads that I've de novo assembled with Newbler and mapped back to the genome with GMAP. I find that there are a decent amount of 454 assembled transcripts that do not have ABI reads mapping, hence coverage issue.

What are your experiences with doing a transcriptome assembly with SOLiD reads? Reference or de novo?

assembly rna • 2.0k views
ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Damian Kao15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour