Hi All
I would like to conduct a gene set enrichment analysis of RNA-seq data. The experimental setup: There are bacteria of interest (let's call them A, B and C). RNA-seq data is obtained from monocultures of A, B and C and of the tri-culture ABC.
The genomes of all three bacteria are sequenced and annotated via prokka (https://github.com/tseemann/prokka). Both RNA-seq, genome sequencing and annotation was done by an external company, who also provided differential gene expression analysis of the genes.
My problem is that i cannot find any good ways to associate the genes into gene sets with the given annotations. Does anyone here have some tips? Preferably I would like to use gene sets based on KEGG pathways, but GO-terms or others could do as well. Below I have included an example from my annotations file (.tsv).
locus_tag ftype length_bp gene EC_number COG product
LFFBCOMC_00027 CDS 369 hypothetical protein
LFFBCOMC_00028 CDS 1488 feaB_1 1.2.1.39 COG1012 Phenylacetaldehyde dehydrogenase
LFFBCOMC_00029 CDS 1233 Outer membrane porin protein 32
Only one of the three strains are currently found in the KEGG database.
Thanks in advance!
Thank you! I've looked into it, and eggnog mapper seems to be just what I need.