Trinity de novo transcript assembly too many transcripts
0
2
Entering edit mode
7.8 years ago
Mehmet ▴ 790

Dear All,

I have completed Trinity de novo assembly for a worm species. The k-mer that I used was 25. Then, I have had too many genes and transcripts. How can I remove the duplications of the same transcripts?

>c4_g1_i1 len=584 path=[53:0-583]
>c5_g1_i1 len=221 path=[47:0-166 213:167-220]
>c6_g1_i1 len=223 path=[735:0-15 737:16-222]
Total trinity 'genes':    29340
Total trinity transcripts:    37318
Total assembled bases: 21926265

RNA-Seq Assembly • 3.6k views
1
Entering edit mode

Well, actually I don't think that that number is too big. I've used trinity for a while, and it's normal to have more genes and transcripts than you expected. I would cluster the transcripts by identity to reduce the overall size of the transcriptome without removing any sequence information by only removing 'redundant' (or highly similar) sequences. Also it could be a good idea to perform some kind of contaminant filtering by blast (to remove for example those transcript that have human hits).

0
Entering edit mode

Thank you for your advice. Do you mind If I ask you how to cluster the transcripts? I mean which tools or scripts can be used?

3
Entering edit mode

There are others but, usually I do the clustering using cd-hit.

1
Entering edit mode

cd-hit-est, to be specific, no?

0
Entering edit mode

Yes, exactly :)