Question

De novo transcriptome assembly, annotation, blast using Trinity, transdecoder and blast

0

Entering edit mode

4.7 years ago

dimitrischat ▴ 210

Hello all, never done this before so i will need some guidelines. I am trying to do a de novo transcriptome assembly from a single rna-seq fastq file using Trinity (trimming makes a difference or no?). After 18h it produced a lot of files but i am guessing the one i want is the Trinity.fasta. After that i wanted to do annotation and blast. So i used TransDecoder ( https://github.com/TransDecoder/TransDecoder/wiki ),

TransDecoder.LongOrfs -t target_transcripts.fasta
TransDecoder.Predict -t target_transcripts.fasta

which i am guessing the target_transcripts.fasta is the Trinity.fasta ? and i got .pep, .cds, .gff3, and .bed files. Next step is :

util/gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta

while i dont understand which file is the transcripts.gtf and the test.genome.fasta is which?

RNA-Seq Assembly • 2.0k views

ADD COMMENT • link updated 4.7 years ago by h.mon 35k • written 4.7 years ago by dimitrischat ▴ 210

score 0 · Answer 1 · 2020-02-07

You are confusing a bit the pipelines from the "Trinity suite". Trinity is the (de novo and genome-guided) assembly pipeline, Trinotate is the functional annotation pipeline, and TransDecoder is the transcripts coding regions predictor. After the assembly with Trinity, you have to do a bit of jumping back and forth from TransDecoder to Trinotate to get to the final transcriptome annotation.

However, you said you performed de novo assembly, hence you do not need to use the gtf_genome_to_cdna_fasta.pl script. This script should be used if you were starting the coding regions prediction from a genome-based transcript structure GTF file, which is not your case.