Question: How to cluster transcripts assembled using TRINITY into unigene after blast?
3
gravatar for sbchua.1990
2.3 years ago by
sbchua.199030
sbchua.199030 wrote:

I have assembled a transcriptome using TRINITY with 100,000 transcripts. I have perform blastx against NR database (-outfmt 5). I want to remove redundancy in the assembled transcript before proceed to further processing.

1) How can I cluster them together into unigenes and remove redundancy?

2) Or how to select transcipts with longest sequences if they return same hit?

3) Any software or program suggestion for doing this?

4) Is it necessary for removing redundancy?

Thank you very much.

rna-seq assembly gene • 3.0k views
ADD COMMENTlink modified 2.3 years ago by lakhujanivijay4.3k • written 2.3 years ago by sbchua.199030

See also the Trinity FAQ: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-FAQ#ques_why_so_many_transcripts

ADD REPLYlink written 24 months ago by Michael Dondrup46k
5
gravatar for lakhujanivijay
2.3 years ago by
lakhujanivijay4.3k
India
lakhujanivijay4.3k wrote:

You will get transcripts after assembling short reads. Depending on alternative splicing,one gene can have more than one transcript.

Clustering or grouping puts all these similar sequences together and help to make a set of transcript for one gene.

On the other hand assembly process can create some transcripts that are not real (sequences/transcripts with more than 95% identity to another sequence) and clustering helps identifying them. Therefore, the flow goes like:

Assembly -> Transcripts -> clustering transcripts -> unigenes -> further downstream processing.

You can use CD-HIT for this purpose.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by lakhujanivijay4.3k

Thanks. CD-HIT works well.

ADD REPLYlink written 2.3 years ago by sbchua.199030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1742 users visited in the last hour