Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file
1
0
Entering edit mode
20 months ago
Light92 ▴ 60

Hello! I've de-novo assembled a transcriptome from Trinity, resulting into Trinity.fasta, whose headers look like this:

>TRINITY_DN29256_c0_g1_i1 len=323 path=[0:0-322]

Followed, in the next line, by the sequence.

To run an external downstream analysis with a R script, I'd need to have a .gff3 reference file (FeatureCounts function from RSubread). Of course, for now, annotation isn't needed, just names and coordinates.

I've already performed a classic edgeR analysis with Trinity, I'm just trying something different and need this very specific input file.

Can anyone help me here? Thanks in advance!

Trinity • 1000 views
ADD COMMENT
0
Entering edit mode

I do not have experience with Trinity, but I have seen similar cases where a GFF3 was obtained by mapping the Trinity fasta to the reference with GMAP. Maybe it can help in your case.

ADD REPLY
0
Entering edit mode

I've tried to use GMAP, with the following code, but the script seems to freeze for no reason and I get an empty output file.

gmap -d Trinity.fasta -f 3 > meh.gff3

What do you mean by reference? It's a de-novo assembly, because my organism is not a model one, so I don't really have one.

ADD REPLY
0
Entering edit mode

gmap is to map transcripts against a reference genome. The gff you get describe the location and the structure of the transcripts within the reference genome. As you don't have reference genome it is useless here.

What you can do it is to use transcoder to predict the coding regions within a transcript fasta file. The gff you will get describe the feature type of the different regiosn in each sequence, i.e the exon and what is coding (CDS) and what is non-coding (UTR).

ADD REPLY
0
Entering edit mode

Maybe map with minimap2 instead, then bamtobed, then to gff (or maybe there's a direct bam->gff converter...)

ADD REPLY
1
Entering edit mode
19 months ago
h.mon 32k

featureCounts assigns zero counts to multi-mapped reads. Trinity assemblies have a lot of "redundancy", as the assembler tries to recover all possible isoforms of a gene. This would mean a lot of the mapped reads would map to multiple locations (to several isoforms), and featureCounts would assign zero counts to all those reads. Better approaches to deal with this would be quantification with RSEM, Salmon or kallisto.

ADD COMMENT

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6