Hi you all,
I am trying to perform featureCount over miRNA reads from CHO cells for differential expression analysis purposes. I have created my own annotation file by mapping hairpin miRNA sequences downloaded from mirBase (http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=cgr) against ENSEMBL reference genome for CHO cell. For that purpose, I have used gmap -D:
gmap -D ~/miRNA/crigri_gmap -d crigri_gmap -f 2 -n 0 -t 16 --gff3-cds=genomic hairpin_crigri_dna_mod.fa > trial_1.gff3
I retrieve a .gff file with 9 columns containing the 9 expected fields (seqid,source,type,start,end,score,strand,phase and attributes), but for some scaffolds, I get a "-1" value in the "phase" field. According to ENSEMBLE (https://www.ensembl.org/info/website/upload/gff3.html) 'One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.' So I don't know how to interpret this value.
This is how the head of my .gff3 file looks:
scaffold_6 crigri_gmap gene 62246878 62246977 . + . ID=cgr_let_7a_MI0020368.path1;Name=cgr_let_7a_MI0020368 scaffold_6 crigri_gmap mRNA 62246878 62246977 . + . ID=cgr_let_7a_MI0020368.mrna1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.path1;coverage=100.0;identity=100.0;matches=100;mismatches=0;indels=0;unknowns=0 scaffold_6 crigri_gmap exon 1 62246878 100 + . ID=cgr_let_7a_MI0020368.mrna1.exon1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.mrna1;Target=cgr_let_7a_MI0020368 1 1 + scaffold_6 crigri_gmap CDS 62246878 62246975 100 + -1 ID=cgr_let_7a_MI0020368.mrna1.cds1;Name=cgr_let_7a_MI0020368;Parent=cgr_let_7a_MI0020368.mrna1;Target=cgr_let_7a_MI0020368 1 98 +
Any help with this will be much appreciated. Thank you very much!!