Following the RNA-seq analysis workflow, I am trying to find the GO gene ontology terms for a list of DGEs output of (FeatuCounts > edgeR). I conducted the RNA-seq analysis using either RAST-annotated gtf, or NCBI-PGAP gft files.
1 - In Rast gtf.file the majority of genes are as below (No locus_tag, no transcripts_id)
Scaffold_3 FIG CDS 1598714 1599913 . - 2 ID=fig|6666666.1005592.peg.4310;Name=Quinolone resistance NorA protein
- In NCBI-PGAP, the majority of the genes like below (gene_ID = transcript_ID = locus_tag)
GeneMarkS-2+ stop_codon 235 237 . + 0 gene_id "JYU28_00005"; transcript_id "unassigned_transcript_1"; gbkey "CDS"; inference "COORDINATES: ab initio prediction:GeneMarkS-2+"; locus_tag "JYU28_00005"; partial "true"; product "IS5/IS1182 family transposase"; protein_id "MBO3282641.1"; transl_table "11"; exon_number "1";
In both cases, the gene_IDs are unrelated to any database, even refseq, so I couldn't convert the DGEs list to enterZ, ensembl, or UniProt IDs, which I can use in further GO enrichment analysis.
I appreciate any help or suggestion to find a solution for this issue,