Question: Finding fragemented transcriptome
gravatar for EVR
22 months ago by
EVR510 wrote:


I have de novo assembled transcriptome genereated using trinity pipeline. I would like to know whether this de novo assembled transcriptome is fragmented or not.

Thanks in advance

ADD COMMENTlink modified 22 months ago by Rohit1.3k • written 22 months ago by EVR510
gravatar for Michael Dondrup
22 months ago by
Bergen, Norway
Michael Dondrup46k wrote:

Even though you gave no other details, like assembly statistics, the answer to the question given as is, is YES. Any transcriptome you can get with current technology will be fragmented, because the sequencing technology generates fragments, where there is no guarantee that all fragments can be assembled correctly and completely. However, I assume that that was not what you had in mind. Possibly, you want to know how much -- gradually -- your transcriptome is fragmented, how much it affects your analysis, and the like, or better how complete it is. To address this, you need to calculate transcriptome statistics. Trinity has a script to give you some of the stats, but maybe you want to try other methods of evaluation like BUSCO.

ADD COMMENTlink written 22 months ago by Michael Dondrup46k
$TRINITY_HOME/util/  Trinity.fasta

Trinity also has downstream analysis tools for assessing the completeness of the transcriptome.

ADD REPLYlink modified 22 months ago • written 22 months ago by
gravatar for Rohit
22 months ago by
Rohit1.3k wrote:

Usually you can't expect full transcripts from denovo illumina assemblies, unless you have some long-reads to scaffold these. One way you could go about is to look for the transcript orthologs in the close model species and compare it to your transcript length. For this purpose you would need to extract the CDS, translate it into a protein sequence and then perform a blast to uniprot or ncbi-nrdb. With the most similar sequence you can compare your protein length and know how much you might be off by. Also, there might be cases where your transcripts are smaller since the protein is smaller in your organism, in those cases you just have to check if your CDS is complete i.e. a start and stop - but there might be cases where it is complete but still an exon or more might be missing - due to misassembly or isoforms.

BUSCO as mentioned by Michael is always a good standard check at the end.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Rohit1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1285 users visited in the last hour