Hi you all,
I am trying to perform featureCount over miRNA reads from CHO cells for differential expression analysis purposes. I have created my own annotation file by mapping hairpin miRNA sequences downloaded from mirBase (http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=cgr) against ENSEMBL reference genome for CHO cell. For that purpose, I have used gmap -D:
gmap -D ~/miRNA/crigri_gmap -d crigri_gmap -f 2 -n 0 -t 16 --gff3-cds=genomic hairpin_crigri_dna_mod.fa > trial_1.gff3
I retrieve a .gff file with 9 columns containing the 9 expected fields (seqid,source,type,start,end,score,strand,phase and attributes), but for some scaffolds, I get a "-1" value in the "phase" field. According to ENSEMBLE (https://www.ensembl.org/info/website/upload/gff3.html) 'One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.' So I don't know how to interpret this value.
This is how the head of my .gff3 file looks:
scaffold_6 crigri_gmap gene 62246878 62246977 . + . ID=cgr_let_7a_MI0020368.path1;Name=cgr_let_7a_MI0020368
scaffold_6 crigri_gmap mRNA 62246878 62246977 . + . ID=cgr_let_7a_MI0020368.mrna1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.path1;coverage=100.0;identity=100.0;matches=100;mismatches=0;indels=0;unknowns=0
scaffold_6 crigri_gmap exon 1 62246878 100 + . ID=cgr_let_7a_MI0020368.mrna1.exon1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.mrna1;Target=cgr_let_7a_MI0020368 1 1 +
scaffold_6 crigri_gmap CDS 62246878 62246975 100 + -1 ID=cgr_let_7a_MI0020368.mrna1.cds1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.mrna1;Target=cgr_let_7a_MI0020368 1 98 +
Any help with this will be much appreciated. Thank you very much!!
This is strange, no idea why there is negative value... But to fix the phases you can use agat_sp_fix_cds_phases.pl from AGAT