transcriptome assembly redundant removal
1
0
Entering edit mode
7.6 years ago
402374688 ▴ 30

I de novo assembled several transcriptomes of the same organism and found that with the increase of reads (samples), the size of resulting assembly is larger and larger. But to my knowledege, this should contain redundant transcripts, right? what I want to ask is why this happens and how to remove these redundant transcripts. One more concern is that when removing redundance, is it possible that we lose some genes of the same family or the following quantification steps can be disturbed within the same family? As far as I know, there are following steps that may help: when assembling, use --normalize_reads to limit max read coverage and after trinity assembly, use Tgicl to extend the transcripts and use cd-hit to remove highly similar sequences. Are there some other effective tools or strategies that can help with this?

rna-seq Assembly • 3.6k views
ADD COMMENT
0
Entering edit mode
7.6 years ago

Yes. You most probably will get redundant fasta transcripts, even if you previously used the normalize read option within Trinity.

Some of these assemblies correspond to the same genes, and not necessarily to different isoforms, but to different assembly alternatives

Now, depending on the size of your assembled transcriptome you have several choices.

One is using CD-HIT, but you can be limited by the size of your transcriptome

Another choice is the use of the MIRA or even the CAP3 assemblers, that can generate contigs for you. Another approach is used for programs like IDBA-tran

I am sure that there have to be some other alternatives as well. Here you have a nice paper dealing with this subject

ADD COMMENT
0
Entering edit mode

Thank you. I think my transcriptome is less than 200MB. Maybe it's better to use cd-hit-est to remove redundant transcripts, right? But when using it which threshold can be taken to be redundant, 0.95 or higher? Shall I extend the contig before I remove the redundant ones?

ADD REPLY

Login before adding your answer.

Traffic: 2955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6