Question

Finding fragemented transcriptome

0

Entering edit mode

6.9 years ago

EVR ▴ 610

Hi,

I have de novo assembled transcriptome genereated using trinity pipeline. I would like to know whether this de novo assembled transcriptome is fragmented or not.

Thanks in advance

transcriptome denovo_assembly • 1.5k views

ADD COMMENT • link updated 6.9 years ago by Rohit ★ 1.5k • written 6.9 years ago by EVR ▴ 610

score 0 · Answer 1 · 2017-06-07

Even though you gave no other details, like assembly statistics, the answer to the question given as is, is YES. Any transcriptome you can get with current technology will be fragmented, because the sequencing technology generates fragments, where there is no guarantee that all fragments can be assembled correctly and completely. However, I assume that that was not what you had in mind. Possibly, you want to know how much -- gradually -- your transcriptome is fragmented, how much it affects your analysis, and the like, or better how complete it is. To address this, you need to calculate transcriptome statistics. Trinity has a script to give you some of the stats, but maybe you want to try other methods of evaluation like BUSCO.

score 0 · Answer 2 · 2017-06-07

Usually you can't expect full transcripts from denovo illumina assemblies, unless you have some long-reads to scaffold these. One way you could go about is to look for the transcript orthologs in the close model species and compare it to your transcript length. For this purpose you would need to extract the CDS, translate it into a protein sequence and then perform a blast to uniprot or ncbi-nrdb. With the most similar sequence you can compare your protein length and know how much you might be off by. Also, there might be cases where your transcripts are smaller since the protein is smaller in your organism, in those cases you just have to check if your CDS is complete i.e. a start and stop - but there might be cases where it is complete but still an exon or more might be missing - due to misassembly or isoforms.

BUSCO as mentioned by Michael is always a good standard check at the end.