Hi, I just finished my transcriptome assembly using Trinity. However, the transcripts produced by trinity is too many (~300k transcripts) which is not normal for my sample. I believe most of these transcripts are redundant. How can I remove these redundant transcript?
1) I already tried cdhit est. Unfortunately the output still contains many redundant transcript
2) I also already tried corset and follow the tutorial here (https://github.com/Oshlack/Corset/wiki/Example). However, currently I am stuck on how to recover the unigenes sequence from the corset output
3) I planned on trying to use TGICL to further remove redundant sequence from CD-hit output as done by some studies. However, I am a bit not familiar with TGICL and dont know which parameter to use
It would be happy me if somebody could help with my problem. Thanks