I would like to conduct a gene set enrichment analysis of RNA-seq data. The experimental setup: There are bacteria of interest (let's call them A, B and C). RNA-seq data is obtained from monocultures of A, B and C and of the tri-culture ABC.
The genomes of all three bacteria are sequenced and annotated via prokka (https://github.com/tseemann/prokka). Both RNA-seq, genome sequencing and annotation was done by an external company, who also provided differential gene expression analysis of the genes.
My problem is that i cannot find any good ways to associate the genes into gene sets with the given annotations. Does anyone here have some tips? Preferably I would like to use gene sets based on KEGG pathways, but GO-terms or others could do as well. Below I have included an example from my annotations file (.tsv).
locus_tag ftype length_bp gene EC_number COG product
LFFBCOMC_00027 CDS 369 hypothetical protein
LFFBCOMC_00028 CDS 1488 feaB_1 126.96.36.199 COG1012 Phenylacetaldehyde dehydrogenase
LFFBCOMC_00029 CDS 1233 Outer membrane porin protein 32
Only one of the three strains are currently found in the KEGG database.
Thanks in advance!