Entering edit mode
5.5 years ago
mxlsherry1992
▴
80
Dear all,
I have RNA-SEQ data for a species, it's relative species has a genome, about 20,000 genes. But we need to use Trinity assembly for our species. I have 7 time point, 2 replicates for each time point. After Trinity de novo assembly, I got about 1319212 transcript sequence, then I use CD-HIT to remove redundancy,and "get_longest_isoform_seq_per_trinity_gene.pl" to get longest isoform, but it still about 650,000 transcript sequences. I thought this species should only have about 40,000 - 60,000 transcripts, but there are too much more in reality.
If any one know what's the problem with that?
Thanks!
There is similar question here : De novo transcriptome assembly produce too many transcripts
Thank you for reply! And I look through some papers, seems my relevant species also have many contigs in their paper. But the "contigs" they refer is the same meaning with transcripts, right? (just the sequences number in fasta file) :)
Thank you!
I do not know about your specific project but in general contigs are from genome assemblies and transcripts are from transcriptome assemblies. These are two different things: - In genome assembly your goal is to create a complete representation of the genome (Ideally with a fasta containing one sequence per chromosome) - The transcriptome assembly aims at generating the whole set of transcripts (the fasta will contain many sequences, corresponding to the transcripts and their different isoforms).
I got it! Thank you!!