Question

Gene Set Enrichment Analysis of RNA-seq data based on prokka annotated genomes

0

Entering edit mode

4.1 years ago

viktorht • 0

Hi All

I would like to conduct a gene set enrichment analysis of RNA-seq data. The experimental setup: There are bacteria of interest (let's call them A, B and C). RNA-seq data is obtained from monocultures of A, B and C and of the tri-culture ABC.

The genomes of all three bacteria are sequenced and annotated via prokka (https://github.com/tseemann/prokka). Both RNA-seq, genome sequencing and annotation was done by an external company, who also provided differential gene expression analysis of the genes.

My problem is that i cannot find any good ways to associate the genes into gene sets with the given annotations. Does anyone here have some tips? Preferably I would like to use gene sets based on KEGG pathways, but GO-terms or others could do as well. Below I have included an example from my annotations file (.tsv).

locus_tag ftype length_bp gene EC_number COG product

LFFBCOMC_00027 CDS 369 hypothetical protein

LFFBCOMC_00028 CDS 1488 feaB_1 1.2.1.39 COG1012 Phenylacetaldehyde dehydrogenase

LFFBCOMC_00029 CDS 1233 Outer membrane porin protein 32

Only one of the three strains are currently found in the KEGG database.

Thanks in advance!

RNA-Seq GSEA prokka COG • 1.1k views

ADD COMMENT • link updated 4.1 years ago by Asaf 10k • written 4.1 years ago by viktorht • 0

score 3 · Answer 1 · 2020-03-10

3

Entering edit mode

4.1 years ago

Asaf 10k

Prokka will not give you a comprehensive KO mapping. You can run eggnog-mapper to associate genes to homology group and GO and KEGG.