Question: Finding fragemented transcriptome
gravatar for EVR
3.6 years ago by
EVR570 wrote:


I have de novo assembled transcriptome genereated using trinity pipeline. I would like to know whether this de novo assembled transcriptome is fragmented or not.

Thanks in advance

ADD COMMENTlink modified 3.6 years ago by Rohit1.4k • written 3.6 years ago by EVR570
gravatar for Michael Dondrup
3.6 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

Even though you gave no other details, like assembly statistics, the answer to the question given as is, is YES. Any transcriptome you can get with current technology will be fragmented, because the sequencing technology generates fragments, where there is no guarantee that all fragments can be assembled correctly and completely. However, I assume that that was not what you had in mind. Possibly, you want to know how much -- gradually -- your transcriptome is fragmented, how much it affects your analysis, and the like, or better how complete it is. To address this, you need to calculate transcriptome statistics. Trinity has a script to give you some of the stats, but maybe you want to try other methods of evaluation like BUSCO.

ADD COMMENTlink written 3.6 years ago by Michael Dondrup48k
$TRINITY_HOME/util/  Trinity.fasta

Trinity also has downstream analysis tools for assessing the completeness of the transcriptome.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by
gravatar for Rohit
3.6 years ago by
Rohit1.4k wrote:

Usually you can't expect full transcripts from denovo illumina assemblies, unless you have some long-reads to scaffold these. One way you could go about is to look for the transcript orthologs in the close model species and compare it to your transcript length. For this purpose you would need to extract the CDS, translate it into a protein sequence and then perform a blast to uniprot or ncbi-nrdb. With the most similar sequence you can compare your protein length and know how much you might be off by. Also, there might be cases where your transcripts are smaller since the protein is smaller in your organism, in those cases you just have to check if your CDS is complete i.e. a start and stop - but there might be cases where it is complete but still an exon or more might be missing - due to misassembly or isoforms.

BUSCO as mentioned by Michael is always a good standard check at the end.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Rohit1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour