GMAP gff wrong direction of transcripts?
1
0
Entering edit mode
15 months ago
maria • 0

Hi all,

I am doing an assembly of a non-model organism transcriptome. I assembled the RNA-seq reads with Trinity (genome-guided assembly) and to get the gff, I mapped the fasta output from Trinity to the reference genome using GMAP. The gff output has MANY instances where even if the direction is specified as 'sense', the sign in the direction column is '-'. Same happens when the direction is 'antisense'. I thought it was weird because that seems to happen pretty much exactly half of the times. Here is an example:

LQNS02276481.1  phaw        gene    17436406        17487190        .       +       .       ID=TRINITY_GG_63141_c0_g1_i1.path1;Name=TRINITY_GG_63141_c0_g1_i1;**Dir=antisense**


I checked, and the correct direction seems to always be in 'Dir=', not in the direction column (the 7th column here, '+')

The command I ran was:

gmap -d phaw --gff3-add-separators=0 -f 2 -n 1 Trinity-GG.fasta > gmap_phaw.gff3

GMAP version 2020-06-01 called with args: gmap.sse42

RNA-Seq rna-seq Assembly alignment • 658 views
0
Entering edit mode

Very short transcripts (1 exon transcripts) are very difficult to predict the orientation of.

Try taking longer 2-3 exon transcripts. Check orientation, does it make sense with respect to the ATG, exons etc ?

Visualize the GMAP gff3 and compare with existing annotated data in a web browser. Does it fit ?

1
Entering edit mode
3 months ago
alephreish ▴ 50

One year later: These are two different things. The strand (the 7th column) indicates orientation of the gene in the genome and doesn't depend on the direction of the query sequence, while the Dir= property indicates how the particular query sequence mapped onto the gene ('sense' would indicate that its orientation is the same as that of the gene - even if the gene is located on the '-' strain, while 'antisense' would indicate that is in reverse complement with respect to the gene sequence).