I have created de novo assemblies from RNAseq reads using velvet/oases for different subjects at several time points. For every subject I have a merged file that has ~ 100,000 de novo transcripts that were created by merging other transcripts with different k-mer sizes. My ultimate goal is to perform differentially expressed analysis on this data set. The next step is to create a reference transcriptome that has all the transcripts from all subjects and time points, with no ambiguity, so I can map the de novo transcripts to the reference transcriptome that was created and quantify expression.
My question is in regards to a program that will merge all the transcripts from all subjects and time points and create a transcriptome that has just one copy of the same transcript and is also not missing any of the de novo transcripts that were found. Any suggestions? Is cd-hit a good option?
Thank you in advance for your help. I really appreciate it.