The main goal of my research, is to find the abundance of genes in a metagenomic sample. Meaning, that I not only want to figure out what kind of genes are in the sample, but also how many of each of the genes are present (since it is a big metagenomic sample with lots of bacteria, the abundance of some of the genes is expected to be quite high).
In order to keep everything manageable, I want to group genes with very close function into the same group. I know that EggNOG can do this. My question is now, how do I group my genes, while still taking the abundance/depth of the gene into account? EggNOG takes as input a FAA-file, which only contains gene protein sequence information, but not the abundance/depth. I have used BWA with my raw metagenomic reads onto my annotated genes, to get an alignment. Is there a way for me to use this information from BWA? If there is a whole other, better apporach, feel free to speak up!
I have achieved the following datafiles:
Prokka/Prodigal output files (faa, fsa, fna etc) from gene annotation of prokaryotic bins (from metagenomic data).
BWA output files (bam, sam) from alignment of raw reads onto my annotated genes (did this to achieve the depth of each of the genes; how many the reads cover each of the genes)
EggNOG output files (tsv, txt) from the annotated genes (used to categorize the genes into groups of function, to make everything more manageable and easier to visualize). But without gene depth information.
Thanks very much in advance!
Please feel free to ask questions, if I have been too unclear.