Question: transcriptome assembly redundant removal
0
gravatar for 402374688
2.2 years ago by
40237468820
40237468820 wrote:

I de novo assembled several transcriptomes of the same organism and found that with the increase of reads (samples), the size of resulting assembly is larger and larger. But to my knowledege, this should contain redundant transcripts, right? what I want to ask is why this happens and how to remove these redundant transcripts. One more concern is that when removing redundance, is it possible that we lose some genes of the same family or the following quantification steps can be disturbed within the same family? As far as I know, there are following steps that may help: when assembling, use --normalize_reads to limit max read coverage and after trinity assembly, use Tgicl to extend the transcripts and use cd-hit to remove highly similar sequences. Are there some other effective tools or strategies that can help with this?

rna-seq assembly • 1.4k views
ADD COMMENTlink modified 2.2 years ago by Antonio R. Franco3.9k • written 2.2 years ago by 40237468820
0
gravatar for Antonio R. Franco
2.2 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.9k wrote:

Yes. You most probably will get redundant fasta transcripts, even if you previously used the normalize read option within Trinity.

Some of these assemblies correspond to the same genes, and not necessarily to different isoforms, but to different assembly alternatives

Now, depending on the size of your assembled transcriptome you have several choices.

One is using CD-HIT, but you can be limited by the size of your transcriptome

Another choice is the use of the MIRA or even the CAP3 assemblers, that can generate contigs for you. Another approach is used for programs like IDBA-tran

I am sure that there have to be some other alternatives as well. Here you have a nice paper dealing with this subject

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Antonio R. Franco3.9k

Thank you. I think my transcriptome is less than 200MB. Maybe it's better to use cd-hit-est to remove redundant transcripts, right? But when using it which threshold can be taken to be redundant, 0.95 or higher? Shall I extend the contig before I remove the redundant ones?

ADD REPLYlink written 2.2 years ago by 40237468820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour