Question: Is there any alternative for CD_Hit to remove redundancy from asemmbled trinity output file?
0
gravatar for rahmati.razieh83
16 days ago by
rahmati.razieh8320 wrote:

Hi everyone

I have a problem with reduction of redundancy from trinity output file. I have got an assembled fasta file from trinity containing 302000 contigs showing so much redundancy. I used CD_hit to remove redundancies and get unigenes. After using CD_Hit the number of contigs reduced to 240000 contigs showing lots of redundancies again. CD_Hit was not effective to achieve unigenes. Please give me advise how can I get unigens and remove redundancies?

Thanks

assembly • 167 views
ADD COMMENTlink modified 14 days ago by Jake Warner640 • written 16 days ago by rahmati.razieh8320

Did you tweak the identity threshold using -c on the cd-hit?

ADD REPLYlink written 16 days ago by Sej Modha3.1k

When you say

240000 contigs showing lots of redundancies again

How do you verify that?

Try using TGICL

ADD REPLYlink modified 16 days ago • written 16 days ago by Vijay Lakhujani2.8k
1
gravatar for h.mon
16 days ago by
h.mon16k
Brazil
h.mon16k wrote:

Trinity has a somewhat new script to construct "SuperTranscripts" based on the gene-to-isoform relationships and the sequence graph structure leveraged by Trinity during assembly. I think this will result in a better representation of unigenes than using cdhit.

$TRINITY_HOME/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py \
   --trinity_fasta Trinity.fasta
ADD COMMENTlink modified 15 days ago • written 16 days ago by h.mon16k
1
gravatar for Jake Warner
14 days ago by
Jake Warner640
Jake Warner640 wrote:

Getting 'unigenes' from Trinity assemblies is tricky business. I've found that Corset performs better than CD-Hit. Another idea is to BLAST all the transcripts and group them by reciprocal best blast hit.

ADD COMMENTlink written 14 days ago by Jake Warner640
1

LACE and Corset are tools from the same group. Initially I thought LACE would be the preferred tool, as it was developed more recently, but I was wrong: according to one of the authors of both tools, they should be equivalent for the purpose of doing gene-level differential expression analysis. As the Trinity Trinity_gene_splice_modeler.py is based on the same algorithm as LACE, it should also be equivalent to Corset.

ADD REPLYlink written 8 days ago by h.mon16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1464 users visited in the last hour