Question: Is there any alternative for CD_Hit to remove redundancy from asemmbled trinity output file?
0
gravatar for rahmati.razieh83
4 months ago by
rahmati.razieh8320 wrote:

Hi everyone

I have a problem with reduction of redundancy from trinity output file. I have got an assembled fasta file from trinity containing 302000 contigs showing so much redundancy. I used CD_hit to remove redundancies and get unigenes. After using CD_Hit the number of contigs reduced to 240000 contigs showing lots of redundancies again. CD_Hit was not effective to achieve unigenes. Please give me advise how can I get unigens and remove redundancies?

Thanks

assembly • 378 views
ADD COMMENTlink modified 4 months ago by Jake Warner670 • written 4 months ago by rahmati.razieh8320

Did you tweak the identity threshold using -c on the cd-hit?

ADD REPLYlink written 4 months ago by Sej Modha3.8k

When you say

240000 contigs showing lots of redundancies again

How do you verify that?

Try using TGICL

ADD REPLYlink modified 4 months ago • written 4 months ago by Vijay Lakhujani3.2k
1
gravatar for h.mon
4 months ago by
h.mon21k
Brazil
h.mon21k wrote:

Trinity has a somewhat new script to construct "SuperTranscripts" based on the gene-to-isoform relationships and the sequence graph structure leveraged by Trinity during assembly. I think this will result in a better representation of unigenes than using cdhit.

$TRINITY_HOME/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py \
   --trinity_fasta Trinity.fasta
ADD COMMENTlink modified 4 months ago • written 4 months ago by h.mon21k
1
gravatar for Jake Warner
4 months ago by
Jake Warner670
Jake Warner670 wrote:

Getting 'unigenes' from Trinity assemblies is tricky business. I've found that Corset performs better than CD-Hit. Another idea is to BLAST all the transcripts and group them by reciprocal best blast hit.

ADD COMMENTlink written 4 months ago by Jake Warner670
1

LACE and Corset are tools from the same group. Initially I thought LACE would be the preferred tool, as it was developed more recently, but I was wrong: according to one of the authors of both tools, they should be equivalent for the purpose of doing gene-level differential expression analysis. As the Trinity Trinity_gene_splice_modeler.py is based on the same algorithm as LACE, it should also be equivalent to Corset.

ADD REPLYlink written 4 months ago by h.mon21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1321 users visited in the last hour