Trinity de novo transcript assembly too many transcripts
0
2
Entering edit mode
8.9 years ago
Mehmet ▴ 820

Dear All,

I have completed Trinity de novo assembly for a worm species. The k-mer that I used was 25. Then, I have had too many genes and transcripts. How can I remove the duplications of the same transcripts?

>c4_g1_i1 len=584 path=[53:0-583]
>c5_g1_i1 len=221 path=[47:0-166 213:167-220]
>c6_g1_i1 len=223 path=[735:0-15 737:16-222]
Total trinity 'genes':    29340
Total trinity transcripts:    37318
    Total assembled bases: 21926265
RNA-Seq Assembly • 4.0k views
ADD COMMENT
1
Entering edit mode

Well, actually I don't think that that number is too big. I've used trinity for a while, and it's normal to have more genes and transcripts than you expected. I would cluster the transcripts by identity to reduce the overall size of the transcriptome without removing any sequence information by only removing 'redundant' (or highly similar) sequences. Also it could be a good idea to perform some kind of contaminant filtering by blast (to remove for example those transcript that have human hits).

ADD REPLY
0
Entering edit mode

Thank you for your advice. Do you mind If I ask you how to cluster the transcripts? I mean which tools or scripts can be used?

ADD REPLY
3
Entering edit mode

There are others but, usually I do the clustering using cd-hit.

ADD REPLY
1
Entering edit mode

cd-hit-est, to be specific, no?

ADD REPLY
0
Entering edit mode

Yes, exactly :)

ADD REPLY

Login before adding your answer.

Traffic: 2308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6