Hello! I've de-novo assembled a transcriptome from Trinity, resulting into Trinity.fasta, whose headers look like this:
>TRINITY_DN29256_c0_g1_i1 len=323 path=[0:0-322]
Followed, in the next line, by the sequence.
To run an external downstream analysis with a R script, I'd need to have a .gff3 reference file (FeatureCounts function from RSubread). Of course, for now, annotation isn't needed, just names and coordinates.
I've already performed a classic edgeR analysis with Trinity, I'm just trying something different and need this very specific input file.
Can anyone help me here? Thanks in advance!
I do not have experience with Trinity, but I have seen similar cases where a GFF3 was obtained by mapping the Trinity fasta to the reference with GMAP. Maybe it can help in your case.
I've tried to use GMAP, with the following code, but the script seems to freeze for no reason and I get an empty output file.
What do you mean by reference? It's a de-novo assembly, because my organism is not a model one, so I don't really have one.
gmapis to map transcripts against a reference genome. The
gffyou get describe the location and the structure of the transcripts within the reference genome. As you don't have reference genome it is useless here.
What you can do it is to use transcoder to predict the coding regions within a transcript fasta file. The
gffyou will get describe the feature type of the different regiosn in each sequence, i.e the exon and what is coding (CDS) and what is non-coding (UTR).
Hi Juke34, if you're able, can you please clarify something for me (based on the answer you've given here)? Thanks a lot in advance. I also have a similar issue where I know that I need a gtf or gff file for downstream mapping, but not sure which approach is best. I've already conducted a de novo Trinity reconstruction for my non-model species, and I've completed the Transdecoder and Trinotate pipelines. We have a "good enough" genome for this species already, but I didn't use it for the reconstruction because we didn't want to be constrained by the genome. Downstream, we need to map some RNA-seq reads using this "good enough" genome, and my annotation is supposed to accompany this, but I'm unsure about the gtf file. Do I use the transdecoder one, or should I use GMAP to obtain one that's specific to my "good enough" genome? I would greatly appreciate some help. Thank you!
You will need to map your transdecoder fasta file to your genome in order to make an annotation that described the location of your genes within this genome (GFF/GTF). You can use braker, Augustus, maker , PASA or other annotation tool.
Maybe map with minimap2 instead, then bamtobed, then to gff (or maybe there's a direct bam->gff converter...)